Actions
action #64298
closedpostgres error "duplicate key value violates unique constraint "job_modules_job_id_name_category_script" ... INSERT INTO job_modules" filling up postgres server log files quickly
Start date:
2020-03-03
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Similar as in #60926 , reported originally in the osd infrastructure ticket #64096 .
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?fullscreen&edit&tab=alert&panelId=74&orgId=1&from=1580815452576&to=1583211256414 shows that there is a rapid increase in space usage since 2020-02-17 until we hit the alert threshold. On 2020-03-02 22:00 suddenly the space usage was gone. The postgres log rotate mitigated this but the underlying problem is an error in postgres logs, like:
2020-03-08 03:12:02.042 CET openqa geekotest [12783]ERROR: duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.042 CET openqa geekotest [12783]DETAIL: Key (job_id, name, category, script)=(3967446, pthread_barrier_init_3-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.042 CET openqa geekotest [12783]STATEMENT: INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.045 CET openqa geekotest [12783]ERROR: duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.045 CET openqa geekotest [12783]DETAIL: Key (job_id, name, category, script)=(3967446, pthread_barrier_init_4-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.045 CET openqa geekotest [12783]STATEMENT: INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.053 CET openqa geekotest [12783]ERROR: duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.053 CET openqa geekotest [12783]DETAIL: Key (job_id, name, category, script)=(3967446, pthread_barrier_wait_1-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.053 CET openqa geekotest [12783]STATEMENT: INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.058 CET openqa geekotest [12783]ERROR: duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.058 CET openqa geekotest [12783]DETAIL: Key (job_id, name, category, script)=(3967446, pthread_barrier_wait_2-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.058 CET openqa geekotest [12783]STATEMENT: INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
which is the second most common entry already mentioned in #60923#note-3
Acceptance criteria¶
- AC1: Above error is not filling up postgres logs rapidly
Suggestions¶
- Look into what https://github.com/os-autoinst/openQA/pull/2617 already tries to do and where it's stuck
- Optimize job module insertion to prevent the problem or report error to the test running scope, e.g. abort test early to inform test writers what they did wrong, or catch and prevent the error in os-autoinst scope
Updated by okurz about 4 years ago
- Copied from action #64096: partition /srv was nearly depleted but now fixed (itself?) added
Updated by mkittler about 4 years ago
- Status changed from Workable to In Progress
- Assignee set to mkittler
- I'd like to check how well the job module statistics stored in the jobs table are tested and eventually improve the tests.
- Then I would get rid of the DBIx hooks for that feature and just update the statistics manually as needed. It means we can not "just" insert job modules whenever we want (statistics in the jobs table would go out-of-sync). However, it looks like job modules are only inserted via the job result functions anyways so I guess that's fine.
- Then we can actually optimize the query using raw SQL. I'd have to look up how it works exactly myself but I suppose we're looking for this
UPSERT
thing.
Updated by mkittler about 4 years ago
Updated by mkittler about 4 years ago
- Status changed from In Progress to Resolved
Seems like it worked. The log after the nightly deployment on o3 doesn't contain the error anymore and is significantly smaller.
Actions