Project

General

Profile

Actions

action #128936

open

API endpoint /api/v1/jobs/:jobid/set_done can be very slow

Added by kraih over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
QA (public, currently private due to #173521) - future
Start date:
2023-05-08
Due date:
% Done:

0%

Estimated time:

Description

Observation

This is a followup to #128345. In the O3 Apache logs (/space/logs/apache2/*) it can be observed that while most requests for the API endpoint /api/v1/jobs/:jobid/set_done run pretty fast, every day there is also a few outliers, like these from 2023-05-07:

192.168.112.18 - - [06/May/2023:01:39:00 +0000] "POST /api/v1/jobs/3267772/set_done?worker_id=519 HTTP/1.1" 200 37 "-" "Mojolicious (Perl)" 101578
192.168.112.7 - - [06/May/2023:01:58:27 +0000] "POST /api/v1/jobs/3267447/set_done?worker_id=93 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 47713
192.168.112.18 - - [06/May/2023:02:22:47 +0000] "POST /api/v1/jobs/3267445/set_done?worker_id=505 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 46972
192.168.112.19 - - [06/May/2023:10:37:47 +0000] "POST /api/v1/jobs/3267905/set_done?worker_id=575 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 215399
192.168.112.19 - - [06/May/2023:11:01:37 +0000] "POST /api/v1/jobs/3267907/set_done?worker_id=571 HTTP/1.1" 406 - "-" "Mojolicious (Perl)" 300104
192.168.112.19 - - [06/May/2023:11:14:54 +0000] "POST /api/v1/jobs/3267955/set_done?worker_id=574 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 110528
192.168.112.17 - - [06/May/2023:14:29:27 +0000] "POST /api/v1/jobs/3268327/set_done?worker_id=540 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 84127
192.168.112.19 - - [06/May/2023:18:24:27 +0000] "POST /api/v1/jobs/3268700/set_done?worker_id=573 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 59262
192.168.112.12 - - [06/May/2023:22:02:48 +0000] "POST /api/v1/jobs/3269647/set_done?worker_id=186 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 50023
192.168.112.17 - - [06/May/2023:22:17:35 +0000] "POST /api/v1/jobs/3269577/set_done?worker_id=538 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 119143
192.168.112.17 - - [06/May/2023:22:41:08 +0000] "POST /api/v1/jobs/3269668/set_done?worker_id=537 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 62961
192.168.112.17 - - [06/May/2023:22:41:41 +0000] "POST /api/v1/jobs/3269675/set_done?worker_id=522 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 61466

None of these should have taken more than a few seconds.

Acceptance criteria

  • AC1: It is known if the slow response times were caused by a bug in the bugref takeover code
  • AC2: The logs no longer show any requests for /api/v1/jobs/:jobid/set_done that take longer than a few seconds

Suggestions

  • Move the job done or bugref takeover code into a Minion background job (tricky, because of significant return value and many callers)
  • Fix bugref takeover bugs or improve performance by optimising the code if that turns out to be an option
  • Use a copy of the O3 database to replicate the problem

Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #128345: [logwarn] Worker 30538 has no heartbeat (400 seconds), restarting size:MResolvedkraih2023-04-272023-05-20

Actions
Actions #1

Updated by kraih over 1 year ago

  • Related to action #128345: [logwarn] Worker 30538 has no heartbeat (400 seconds), restarting size:M added
Actions #2

Updated by kraih over 1 year ago

  • Description updated (diff)
Actions #3

Updated by kraih over 1 year ago

  • Difficulty changed from medium to hard
Actions #4

Updated by kraih over 1 year ago

  • Description updated (diff)
Actions #5

Updated by kraih over 1 year ago

Comment from Marius on GitHub regarding the carry_over_bugrefs code: "Supposedly this should go into the finalize Minion job then. We just need to be careful about interactions of this with the hook script or consumers of events."

Actions

Also available in: Atom PDF