Actions
action #128936
openAPI endpoint /api/v1/jobs/:jobid/set_done can be very slow
Status:
New
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
QA (public, currently private due to #173521) - future
Start date:
2023-05-08
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
This is a followup to #128345. In the O3 Apache logs (/space/logs/apache2/*
) it can be observed that while most requests for the API endpoint /api/v1/jobs/:jobid/set_done
run pretty fast, every day there is also a few outliers, like these from 2023-05-07:
192.168.112.18 - - [06/May/2023:01:39:00 +0000] "POST /api/v1/jobs/3267772/set_done?worker_id=519 HTTP/1.1" 200 37 "-" "Mojolicious (Perl)" 101578
192.168.112.7 - - [06/May/2023:01:58:27 +0000] "POST /api/v1/jobs/3267447/set_done?worker_id=93 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 47713
192.168.112.18 - - [06/May/2023:02:22:47 +0000] "POST /api/v1/jobs/3267445/set_done?worker_id=505 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 46972
192.168.112.19 - - [06/May/2023:10:37:47 +0000] "POST /api/v1/jobs/3267905/set_done?worker_id=575 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 215399
192.168.112.19 - - [06/May/2023:11:01:37 +0000] "POST /api/v1/jobs/3267907/set_done?worker_id=571 HTTP/1.1" 406 - "-" "Mojolicious (Perl)" 300104
192.168.112.19 - - [06/May/2023:11:14:54 +0000] "POST /api/v1/jobs/3267955/set_done?worker_id=574 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 110528
192.168.112.17 - - [06/May/2023:14:29:27 +0000] "POST /api/v1/jobs/3268327/set_done?worker_id=540 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 84127
192.168.112.19 - - [06/May/2023:18:24:27 +0000] "POST /api/v1/jobs/3268700/set_done?worker_id=573 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 59262
192.168.112.12 - - [06/May/2023:22:02:48 +0000] "POST /api/v1/jobs/3269647/set_done?worker_id=186 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 50023
192.168.112.17 - - [06/May/2023:22:17:35 +0000] "POST /api/v1/jobs/3269577/set_done?worker_id=538 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 119143
192.168.112.17 - - [06/May/2023:22:41:08 +0000] "POST /api/v1/jobs/3269668/set_done?worker_id=537 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 62961
192.168.112.17 - - [06/May/2023:22:41:41 +0000] "POST /api/v1/jobs/3269675/set_done?worker_id=522 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 61466
None of these should have taken more than a few seconds.
Acceptance criteria¶
- AC1: It is known if the slow response times were caused by a bug in the bugref takeover code
- AC2: The logs no longer show any requests for
/api/v1/jobs/:jobid/set_done
that take longer than a few seconds
Suggestions¶
- Move the job done or bugref takeover code into a Minion background job (tricky, because of significant return value and many callers)
- Fix bugref takeover bugs or improve performance by optimising the code if that turns out to be an option
- Use a copy of the O3 database to replicate the problem
Updated by kraih over 1 year ago
- Related to action #128345: [logwarn] Worker 30538 has no heartbeat (400 seconds), restarting size:M added
Updated by kraih over 1 year ago
Comment from Marius on GitHub regarding the carry_over_bugrefs
code: "Supposedly this should go into the finalize Minion job then. We just need to be careful about interactions of this with the hook script or consumers of events."
Actions