Project

General

Profile

Actions

action #128936

open

API endpoint /api/v1/jobs/:jobid/set_done can be very slow

Added by kraih 12 months ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
2023-05-08
Due date:
% Done:

0%

Estimated time:

Description

Observation

This is a followup to #128345. In the O3 Apache logs (/space/logs/apache2/*) it can be observed that while most requests for the API endpoint /api/v1/jobs/:jobid/set_done run pretty fast, every day there is also a few outliers, like these from 2023-05-07:

192.168.112.18 - - [06/May/2023:01:39:00 +0000] "POST /api/v1/jobs/3267772/set_done?worker_id=519 HTTP/1.1" 200 37 "-" "Mojolicious (Perl)" 101578
192.168.112.7 - - [06/May/2023:01:58:27 +0000] "POST /api/v1/jobs/3267447/set_done?worker_id=93 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 47713
192.168.112.18 - - [06/May/2023:02:22:47 +0000] "POST /api/v1/jobs/3267445/set_done?worker_id=505 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 46972
192.168.112.19 - - [06/May/2023:10:37:47 +0000] "POST /api/v1/jobs/3267905/set_done?worker_id=575 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 215399
192.168.112.19 - - [06/May/2023:11:01:37 +0000] "POST /api/v1/jobs/3267907/set_done?worker_id=571 HTTP/1.1" 406 - "-" "Mojolicious (Perl)" 300104
192.168.112.19 - - [06/May/2023:11:14:54 +0000] "POST /api/v1/jobs/3267955/set_done?worker_id=574 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 110528
192.168.112.17 - - [06/May/2023:14:29:27 +0000] "POST /api/v1/jobs/3268327/set_done?worker_id=540 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 84127
192.168.112.19 - - [06/May/2023:18:24:27 +0000] "POST /api/v1/jobs/3268700/set_done?worker_id=573 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 59262
192.168.112.12 - - [06/May/2023:22:02:48 +0000] "POST /api/v1/jobs/3269647/set_done?worker_id=186 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 50023
192.168.112.17 - - [06/May/2023:22:17:35 +0000] "POST /api/v1/jobs/3269577/set_done?worker_id=538 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 119143
192.168.112.17 - - [06/May/2023:22:41:08 +0000] "POST /api/v1/jobs/3269668/set_done?worker_id=537 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 62961
192.168.112.17 - - [06/May/2023:22:41:41 +0000] "POST /api/v1/jobs/3269675/set_done?worker_id=522 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 61466

None of these should have taken more than a few seconds.

Acceptance criteria

  • AC1: It is known if the slow response times were caused by a bug in the bugref takeover code
  • AC2: The logs no longer show any requests for /api/v1/jobs/:jobid/set_done that take longer than a few seconds

Suggestions

  • Move the job done or bugref takeover code into a Minion background job (tricky, because of significant return value and many callers)
  • Fix bugref takeover bugs or improve performance by optimising the code if that turns out to be an option
  • Use a copy of the O3 database to replicate the problem

Related issues 1 (0 open1 closed)

Related to openQA Project - action #128345: [logwarn] Worker 30538 has no heartbeat (400 seconds), restarting size:MResolvedkraih2023-04-272023-05-20

Actions
Actions

Also available in: Atom PDF