Project

General

Profile

Actions

action #64298

closed

postgres error "duplicate key value violates unique constraint "job_modules_job_id_name_category_script" ... INSERT INTO job_modules" filling up postgres server log files quickly

Added by okurz about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
2020-03-03
Due date:
% Done:

0%

Estimated time:

Description

Observation

Similar as in #60926 , reported originally in the osd infrastructure ticket #64096 .

https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?fullscreen&edit&tab=alert&panelId=74&orgId=1&from=1580815452576&to=1583211256414 shows that there is a rapid increase in space usage since 2020-02-17 until we hit the alert threshold. On 2020-03-02 22:00 suddenly the space usage was gone. The postgres log rotate mitigated this but the underlying problem is an error in postgres logs, like:

2020-03-08 03:12:02.042 CET openqa geekotest [12783]ERROR:  duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.042 CET openqa geekotest [12783]DETAIL:  Key (job_id, name, category, script)=(3967446, pthread_barrier_init_3-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.042 CET openqa geekotest [12783]STATEMENT:  INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.045 CET openqa geekotest [12783]ERROR:  duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.045 CET openqa geekotest [12783]DETAIL:  Key (job_id, name, category, script)=(3967446, pthread_barrier_init_4-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.045 CET openqa geekotest [12783]STATEMENT:  INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.053 CET openqa geekotest [12783]ERROR:  duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.053 CET openqa geekotest [12783]DETAIL:  Key (job_id, name, category, script)=(3967446, pthread_barrier_wait_1-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.053 CET openqa geekotest [12783]STATEMENT:  INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id
2020-03-08 03:12:02.058 CET openqa geekotest [12783]ERROR:  duplicate key value violates unique constraint "job_modules_job_id_name_category_script"
2020-03-08 03:12:02.058 CET openqa geekotest [12783]DETAIL:  Key (job_id, name, category, script)=(3967446, pthread_barrier_wait_2-1, kernel, tests/kernel/run_ltp.pm) already exists.
2020-03-08 03:12:02.058 CET openqa geekotest [12783]STATEMENT:  INSERT INTO job_modules ( always_rollback, category, fatal, important, job_id, milestone, name, script, t_created, t_updated) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 ) RETURNING id

which is the second most common entry already mentioned in #60923#note-3

Acceptance criteria

  • AC1: Above error is not filling up postgres logs rapidly

Suggestions

  • Look into what https://github.com/os-autoinst/openQA/pull/2617 already tries to do and where it's stuck
  • Optimize job module insertion to prevent the problem or report error to the test running scope, e.g. abort test early to inform test writers what they did wrong, or catch and prevent the error in os-autoinst scope

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure - action #64096: partition /srv was nearly depleted but now fixed (itself?)Resolvedokurz2020-03-032020-03-20

Actions
Actions #1

Updated by okurz about 4 years ago

  • Copied from action #64096: partition /srv was nearly depleted but now fixed (itself?) added
Actions #2

Updated by mkittler about 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
  1. I'd like to check how well the job module statistics stored in the jobs table are tested and eventually improve the tests.
  2. Then I would get rid of the DBIx hooks for that feature and just update the statistics manually as needed. It means we can not "just" insert job modules whenever we want (statistics in the jobs table would go out-of-sync). However, it looks like job modules are only inserted via the job result functions anyways so I guess that's fine.
  3. Then we can actually optimize the query using raw SQL. I'd have to look up how it works exactly myself but I suppose we're looking for this UPSERT thing.
Actions #4

Updated by mkittler about 4 years ago

  • Status changed from In Progress to Resolved

Seems like it worked. The log after the nightly deployment on o3 doesn't contain the error anymore and is significantly smaller.

Actions #5

Updated by mkittler about 4 years ago

  • Target version deleted (Current Sprint)
Actions

Also available in: Atom PDF