Actions
action #101283
open[easy][beginner] Retry if webUI times out connection yielding 408 response "Request timeout"
Start date:
2021-10-21
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
In #100859 mkittler triggered a vacuum of the OSD database which made OSD busy and (as expected) caused requests to run into a timeout. This caused jobs to incomplete and also not be restarted.
https://openqa.suse.de/tests/7489935 is one of those cases with reason "api failure: 408 response: Request Timeout". There is no autoinst-log.txt and no worker-log.txt but I could find the corresponding section from system journalon openqaworker-arm-3:
Oct 20 20:34:50 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:34:50 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] Upload concluded (at boot_ltp)
Oct 20 20:35:00 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:35:01 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] Upload concluded (at boot_ltp)
Oct 20 20:35:11 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [error] [pid:5368] REST-API error (POST http://openqa.suse.de/api/v1/jobs/7489935/status): 403 response: ti>
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] Stopping job 7489935 from openqa.suse.de: 07489935-sle-15-SP4-Online-aarch64-Build52.1-l>
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [error] [pid:5368] Unable to make final image uploads. Maybe the web UI considers this job already dead.
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] Upload concluded (at boot_ltp)
Oct 20 20:41:55 openqaworker-arm-3 worker[5368]: [info] [pid:5368] Test schedule has changed, reloading test_order.json
Oct 20 20:41:55 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:43:35 openqaworker-arm-3 worker[5368]: [info] [pid:5368] Trying to stop job gracefully by announcing it to command server via http://localhost:201>
Oct 20 20:43:37 openqaworker-arm-3 worker[5368]: [info] [pid:5368] Isotovideo exit status: 1
Oct 20 20:43:37 openqaworker-arm-3 worker[5368]: [info] [pid:5368] +++ worker notes +++
Oct 20 20:43:37 openqaworker-arm-3 worker[5368]: [info] [pid:5368] End time: 2021-10-20 18:43:37
Oct 20 20:43:37 openqaworker-arm-3 worker[5368]: [info] [pid:5368] Result: api-failure
Oct 20 20:44:05 openqaworker-arm-3 worker[5368]: [error] [pid:5368] REST-API error (POST http://openqa.suse.de/api/v1/jobs/7489935/status): 408 response: Re>
Oct 20 20:45:08 openqaworker-arm-3 worker[5368]: [debug] [pid:13464] Optimizing /var/lib/openqa/pool/13/testresults/boot_ltp-3.png
Oct 20 20:45:08 openqaworker-arm-3 worker[5368]: [debug] [pid:13464] Uploading artefact boot_ltp-3.png as 8b92b9db6e2ba4f1e75127b955e02b5d
Oct 20 20:45:09 openqaworker-arm-3 worker[5368]: [debug] [pid:13464] Optimizing /var/lib/openqa/pool/13/testresults/.thumbs/boot_ltp-3.png
Oct 20 20:45:09 openqaworker-arm-3 worker[5368]: [debug] [pid:13464] Uploading artefact boot_ltp-3.png as 8b92b9db6e2ba4f1e75127b955e02b5d
…
see the two interesting lines about "REST-API error".
Acceptance criteria¶
- AC1: Incomplete jobs with "408 Request timeout" are automatically retriggered
- AC2: Jobs running into "408 Request timeout" internally retry for multiple minutes, e.g. 40 minutes downtime of a central webUI instance
Further details¶
entrance level issue
Updated by okurz over 3 years ago
- Related to action #100859: investigate how to optimize /srv data utilization on OSD size:S added
Updated by okurz over 2 years ago
- Tags set to entrance level, easy, beginner
- Subject changed from Retry if webUI times out connection yielding 408 response "Request timeout" to [easy][beginner] Retry if webUI times out connection yielding 408 response "Request timeout"
- Description updated (diff)
Actions