action #34531

Improvements to the job_module_needles table

Added by szarate almost 2 years ago. Updated almost 2 years ago.

Status:ResolvedStart date:09/04/2018
Priority:NormalDue date:
Assignee:coolo% Done:


Category:Feature requests
Target version:Done


Currently the table job_module_needles has 59M rows and the table is about 3GB (Plus the indexes that can be also quite scary but a Vacumm FULL might help there).

The indexes itself are not the problem, the need is to actually figure a better way to calculate the last time a needle was used without keeping that much information.

Also a point here: Use this task to evaluate the feasibility to start mixing Mojo::Pg with DBIx (Since we don't really rely on SQLite anymore)

openqa=# SELECT nspname || '.' || relname AS "relation",
openqa-#     pg_size_pretty(pg_relation_size(C.oid)) AS "size"
openqa-#   FROM pg_class C
openqa-#   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
openqa-#   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
openqa-#   ORDER BY pg_relation_size(C.oid) DESC
openqa-#   LIMIT 20;
relation                      |  size   
public.job_module_needles_needle_id_job_module_id | 4716 MB
public.job_module_needles_idx_needle_id           | 3878 MB
public.job_module_needles                         | 3021 MB
public.job_module_needles_idx_job_module_id       | 2501 MB


#1 Updated by szarate almost 2 years ago

  • Description updated (diff)

#2 Updated by mkittler almost 2 years ago

When I don't miss something, we use this entire table only to update the columns last_matched_module_id, first_seen_module_id and last_seen_module_id of the needles table itself.

This could likely also be done directly at the same place where we currently update the job_module_needles table (in JobModules::store_needle_infos). The (matched) needle IDs (%needles hash) and the job module ID ($self->id) are available at this point, so the following could be done:

  • Update last_matched_module_id of needles where needle_id is in @matched_needle_ids and last_matched_module_id < $job_module_id.
  • Update first_seen_module_id of needles where needle_id is in @needle_ids and first_seen_module_id > $self->id $job_module_id.
  • Update last_seen_module_id of needles where needle_id is in @needle_ids and last_seen_module_id < $job_module_id.

Somehow this sounds too simple. Maybe I'm overlooking something.

#3 Updated by coolo almost 2 years ago

well, what happens if we delete jobs?

#4 Updated by coolo almost 2 years ago

  • Subject changed from Analyze possible improvements to the job_module_needles table to Improvements to the job_module_needles table

So what we agreed upon in real life:

  • drop the huge table (it eats currently 45% of our DB storage space)
  • drop first_seen (we don't use it at all)
  • add to last_seen_id and last_matched_id a timestamp
  • set the IDs to NULL in case of deletion
  • if we have an ID link the module in admin/needles, if we haven't - bad luck

#5 Updated by coolo almost 2 years ago

  • Assignee set to coolo

#6 Updated by coolo almost 2 years ago

  • Status changed from New to Resolved
  • Target version changed from Ready to Current Sprint


#7 Updated by szarate almost 2 years ago

  • Target version changed from Current Sprint to Done

Also available in: Atom PDF