Project

General

Profile

action #128936

Updated by kraih over 1 year ago

## Observation 

 This is a followup to #128345. In the O3 Apache logs (`/space/logs/apache2/*`) it can be observed that while most requests for the API endpoint `/api/v1/jobs/:jobid/set_done` run pretty fast, every day there is also a few outliers, like these from 2023-05-07: 
 ``` 
 192.168.112.18 - - [06/May/2023:01:39:00 +0000] "POST /api/v1/jobs/3267772/set_done?worker_id=519 HTTP/1.1" 200 37 "-" "Mojolicious (Perl)" 101578 
 192.168.112.7 - - [06/May/2023:01:58:27 +0000] "POST /api/v1/jobs/3267447/set_done?worker_id=93 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 47713 
 192.168.112.18 - - [06/May/2023:02:22:47 +0000] "POST /api/v1/jobs/3267445/set_done?worker_id=505 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 46972 
 192.168.112.19 - - [06/May/2023:10:37:47 +0000] "POST /api/v1/jobs/3267905/set_done?worker_id=575 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 215399 
 192.168.112.19 - - [06/May/2023:11:01:37 +0000] "POST /api/v1/jobs/3267907/set_done?worker_id=571 HTTP/1.1" 406 - "-" "Mojolicious (Perl)" 300104 
 192.168.112.19 - - [06/May/2023:11:14:54 +0000] "POST /api/v1/jobs/3267955/set_done?worker_id=574 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 110528 
 192.168.112.17 - - [06/May/2023:14:29:27 +0000] "POST /api/v1/jobs/3268327/set_done?worker_id=540 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 84127 
 192.168.112.19 - - [06/May/2023:18:24:27 +0000] "POST /api/v1/jobs/3268700/set_done?worker_id=573 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 59262 
 192.168.112.12 - - [06/May/2023:22:02:48 +0000] "POST /api/v1/jobs/3269647/set_done?worker_id=186 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 50023 
 192.168.112.17 - - [06/May/2023:22:17:35 +0000] "POST /api/v1/jobs/3269577/set_done?worker_id=538 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 119143 
 192.168.112.17 - - [06/May/2023:22:41:08 +0000] "POST /api/v1/jobs/3269668/set_done?worker_id=537 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 62961 
 192.168.112.17 - - [06/May/2023:22:41:41 +0000] "POST /api/v1/jobs/3269675/set_done?worker_id=522 HTTP/1.1" 200 33 "-" "Mojolicious (Perl)" 61466 
 ``` 

 None of these should have taken more than a few seconds. 

 ## Acceptance criteria 
 * **AC1:** It is known if the slow response times were caused by a bug in the bugref takeover code 
 * **AC2:** The logs no longer show any requests for `/api/v1/jobs/:jobid/set_done` that take longer than a few seconds 

 ## Suggestions 
 - Move the job done or bugref takeover code into a Minion background job (tricky, because of significant return value and many callers) 
 - Fix bugref takeover bugs or improve performance by optimising the code if that turns out to be an option 
 - Use a copy of the O3 database to replicate the problem 

Back