Project

General

Profile

Actions

action #121777

closed

PostgreSQL update ordering deadlock on jobs table

Added by kraih almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-12-09
Due date:
% Done:

0%

Estimated time:

Description

Observation

Seen in the O3 PostgreSQL logs:

2022-12-09 06:21:08.416 UTC openqa geekotest [6931]ERROR:  deadlock detected
2022-12-09 06:21:08.416 UTC openqa geekotest [6931]DETAIL:  Process 6931 waits for ShareLock on transaction 243609936; blocked by process 5779.
        Process 5779 waits for ShareLock on transaction 243609932; blocked by process 6931.
        Process 6931: UPDATE jobs SET result = $1, t_updated = $2 WHERE id = $3
        Process 5779: UPDATE jobs SET result = $1, t_updated = $2 WHERE id = $3
2022-12-09 06:54:32.659 UTC openqa geekotest [28685]ERROR:  deadlock detected
2022-12-09 06:54:32.659 UTC openqa geekotest [28685]DETAIL:  Process 28685 waits for ShareLock on transaction 243819997; blocked by process 30309.
        Process 30309 waits for ShareLock on transaction 243819999; blocked by process 28685.
        Process 28685: UPDATE jobs SET result = $1, t_updated = $2 WHERE id = $3
        Process 30309: UPDATE jobs SET result = $1, t_updated = $2 WHERE id = $3
2022-12-09 10:01:48.090 UTC openqa geekotest [21755]ERROR:  deadlock detected
2022-12-09 10:01:48.090 UTC openqa geekotest [21755]DETAIL:  Process 21755 waits for ShareLock on transaction 244660614; blocked by process 22220.
        Process 22220 waits for ShareLock on transaction 244660621; blocked by process 21755.
        Process 21755: UPDATE jobs SET result = $1, t_updated = $2 WHERE id = $3
        Process 22220: UPDATE jobs SET result = $1, t_updated = $2 WHERE id = $3

This is probably a very similar problem to #120891, and i noticed it while checking the logs to verify the fix for that works.

Acceptance criteria

  • AC1: No more deadlocks on jobs table

Suggestions


Related issues 2 (0 open2 closed)

Related to openQA Project (public) - action #120891: Product not scheduled: DBD::Pg::st execute failed: ERROR: deadlock detected size:MResolvedmkittler2022-11-232023-01-20

Actions
Related to openQA Project (public) - action #121768: openQA jobs have been restarted but are stuck in "running" size:MResolvedmkittler2022-12-092023-01-19

Actions
Actions #1

Updated by kraih almost 2 years ago

  • Related to action #120891: Product not scheduled: DBD::Pg::st execute failed: ERROR: deadlock detected size:M added
Actions #2

Updated by kraih almost 2 years ago

There is a possibility that this might be the cause for the state getting stuck at running issue from #121768.

Actions #3

Updated by kraih almost 2 years ago

Worth mentioning that of the deadlocks currently in the O3 logs, only the newest 3 are for the jobs table:

ariel:/var/lib/pgsql/data/log # grep deadlock *.log
postgresql-2022-11-24_000000.log:2022-11-24 09:35:05.878 UTC openqa geekotest [32110]ERROR:  deadlock detected
postgresql-2022-11-24_000000.log:2022-11-24 09:35:06.881 UTC openqa geekotest [32298]ERROR:  deadlock detected
postgresql-2022-11-27_000000.log:2022-11-27 04:31:55.788 UTC openqa geekotest [10404]ERROR:  deadlock detected
postgresql-2022-12-01_000000.log:2022-12-01 03:46:12.925 UTC openqa geekotest [7300]ERROR:  deadlock detected
postgresql-2022-12-01_000000.log:2022-12-01 22:08:49.641 UTC openqa geekotest [26387]ERROR:  deadlock detected
postgresql-2022-12-01_000000.log:2022-12-01 23:57:27.173 UTC openqa geekotest [25245]ERROR:  deadlock detected
postgresql-2022-12-06_000000.log:2022-12-06 01:38:27.615 UTC openqa geekotest [10906]ERROR:  deadlock detected
postgresql-2022-12-09_000000.log:2022-12-09 06:21:08.416 UTC openqa geekotest [6931]ERROR:  deadlock detected
postgresql-2022-12-09_000000.log:2022-12-09 06:54:32.659 UTC openqa geekotest [28685]ERROR:  deadlock detected
postgresql-2022-12-09_000000.log:2022-12-09 10:01:48.090 UTC openqa geekotest [21755]ERROR:  deadlock detected
Actions #4

Updated by mkittler almost 2 years ago

Possibly https://github.com/os-autoinst/openQA/pull/4955 improves some cases. Also see #121768#note-10.

Actions #5

Updated by kraih almost 2 years ago

Can't see any new cases on O3, maybe Marius did resolve it with the PR.

Actions #6

Updated by okurz almost 2 years ago

  • Related to action #121768: openQA jobs have been restarted but are stuck in "running" size:M added
Actions #7

Updated by kraih almost 2 years ago

We have a new one on the assets table after the last round of PRs (#122791):

2022-12-30 10:50:18.466 CET openqa geekotest [16610]DETAIL:  Process 16610 waits for ShareLock on transaction 1570088837; blocked by process 16606.
        Process 16606 waits for ShareLock on transaction 1570088839; blocked by process 16610.
        Process 16610: SELECT me.id, me.type, me.name, me.size, me.checksum, me.last_use_job_id, me.fixed, me.t_created, me.t_updated FROM assets me WHERE ( ( me.name = $1 AND me.type = $2 ) ) FOR UPDATE
        Process 16606: SELECT me.id, me.type, me.name, me.size, me.checksum, me.last_use_job_id, me.fixed, me.t_created, me.t_updated FROM assets me WHERE ( ( me.name = $1 AND me.type = $2 ) ) FOR UPDATE
Actions #8

Updated by livdywan almost 2 years ago

  • Status changed from New to Blocked

Let's consider this blocked by #120891 and maybe it'll be resolved afterall

Actions #9

Updated by kraih almost 2 years ago

  • Assignee set to kraih
Actions #10

Updated by kraih over 1 year ago

  • Status changed from Blocked to Resolved

PostgreSQL logs are free of deadlocks.

Actions #11

Updated by okurz over 1 year ago

  • Target version changed from future to Ready
Actions

Also available in: Atom PDF