action #34531
closedImprovements to the job_module_needles table
Description
Currently the table job_module_needles
has 59M rows and the table is about 3GB (Plus the indexes that can be also quite scary but a Vacumm FULL might help there).
The indexes itself are not the problem, the need is to actually figure a better way to calculate the last time a needle was used without keeping that much information.
Also a point here: Use this task to evaluate the feasibility to start mixing Mojo::Pg with DBIx (Since we don't really rely on SQLite anymore)
openqa=# SELECT nspname || '.' || relname AS "relation",
openqa-# pg_size_pretty(pg_relation_size(C.oid)) AS "size"
openqa-# FROM pg_class C
openqa-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
openqa-# WHERE nspname NOT IN ('pg_catalog', 'information_schema')
openqa-# ORDER BY pg_relation_size(C.oid) DESC
openqa-# LIMIT 20;
relation | size
---------------------------------------------------+---------
public.job_module_needles_needle_id_job_module_id | 4716 MB
public.job_module_needles_idx_needle_id | 3878 MB
public.job_module_needles | 3021 MB
public.job_module_needles_idx_job_module_id | 2501 MB
Updated by mkittler over 6 years ago
When I don't miss something, we use this entire table only to update the columns last_matched_module_id
, first_seen_module_id
and last_seen_module_id
of the needles
table itself.
This could likely also be done directly at the same place where we currently update the job_module_needles
table (in JobModules::store_needle_infos
). The (matched) needle IDs (%needles
hash) and the job module ID ($self->id
) are available at this point, so the following could be done:
- Update
last_matched_module_id
ofneedles
whereneedle_id
is in @matched_needle_ids andlast_matched_module_id < $job_module_id
. - Update
first_seen_module_id
ofneedles
whereneedle_id
is in @needle_ids andfirst_seen_module_id > $self->id $job_module_id
. - Update
last_seen_module_id
ofneedles
whereneedle_id
is in @needle_ids andlast_seen_module_id < $job_module_id
.
Somehow this sounds too simple. Maybe I'm overlooking something.
Updated by coolo over 6 years ago
- Subject changed from Analyze possible improvements to the job_module_needles table to Improvements to the job_module_needles table
So what we agreed upon in real life:
- drop the huge table (it eats currently 45% of our DB storage space)
- drop first_seen (we don't use it at all)
- add to last_seen_id and last_matched_id a timestamp
- set the IDs to NULL in case of deletion
- if we have an ID link the module in admin/needles, if we haven't - bad luck
Updated by coolo over 6 years ago
- Status changed from New to Resolved
- Target version changed from Ready to Current Sprint
merged
Updated by szarate over 6 years ago
- Target version changed from Current Sprint to Done