Project

General

Profile

Actions

action #101283

open

[easy][beginner] Retry if webUI times out connection yielding 408 response "Request timeout"

Added by okurz over 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2021-10-21
Due date:
% Done:

0%

Estimated time:

Description

Observation

In #100859 mkittler triggered a vacuum of the OSD database which made OSD busy and (as expected) caused requests to run into a timeout. This caused jobs to incomplete and also not be restarted.

https://openqa.suse.de/tests/7489935 is one of those cases with reason "api failure: 408 response: Request Timeout". There is no autoinst-log.txt and no worker-log.txt but I could find the corresponding section from system journalon openqaworker-arm-3:

Oct 20 20:34:50 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:34:50 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] Upload concluded (at boot_ltp)
Oct 20 20:35:00 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:35:01 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] Upload concluded (at boot_ltp)
Oct 20 20:35:11 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [error] [pid:5368] REST-API error (POST http://openqa.suse.de/api/v1/jobs/7489935/status): 403 response: ti>
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] Stopping job 7489935 from openqa.suse.de: 07489935-sle-15-SP4-Online-aarch64-Build52.1-l>
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [error] [pid:5368] Unable to make final image uploads. Maybe the web UI considers this job already dead.
Oct 20 20:41:45 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] Upload concluded (at boot_ltp)
Oct 20 20:41:55 openqaworker-arm-3 worker[5368]: [info] [pid:5368] Test schedule has changed, reloading test_order.json
Oct 20 20:41:55 openqaworker-arm-3 worker[5368]: [debug] [pid:5368] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7489935/status
Oct 20 20:43:35 openqaworker-arm-3 worker[5368]: [info] [pid:5368] Trying to stop job gracefully by announcing it to command server via http://localhost:201>
Oct 20 20:43:37 openqaworker-arm-3 worker[5368]: [info] [pid:5368] Isotovideo exit status: 1
Oct 20 20:43:37 openqaworker-arm-3 worker[5368]: [info] [pid:5368] +++ worker notes +++
Oct 20 20:43:37 openqaworker-arm-3 worker[5368]: [info] [pid:5368] End time: 2021-10-20 18:43:37
Oct 20 20:43:37 openqaworker-arm-3 worker[5368]: [info] [pid:5368] Result: api-failure
Oct 20 20:44:05 openqaworker-arm-3 worker[5368]: [error] [pid:5368] REST-API error (POST http://openqa.suse.de/api/v1/jobs/7489935/status): 408 response: Re>
Oct 20 20:45:08 openqaworker-arm-3 worker[5368]: [debug] [pid:13464] Optimizing /var/lib/openqa/pool/13/testresults/boot_ltp-3.png
Oct 20 20:45:08 openqaworker-arm-3 worker[5368]: [debug] [pid:13464] Uploading artefact boot_ltp-3.png as 8b92b9db6e2ba4f1e75127b955e02b5d
Oct 20 20:45:09 openqaworker-arm-3 worker[5368]: [debug] [pid:13464] Optimizing /var/lib/openqa/pool/13/testresults/.thumbs/boot_ltp-3.png
Oct 20 20:45:09 openqaworker-arm-3 worker[5368]: [debug] [pid:13464] Uploading artefact boot_ltp-3.png as 8b92b9db6e2ba4f1e75127b955e02b5d
…

see the two interesting lines about "REST-API error".

Acceptance criteria

  • AC1: Incomplete jobs with "408 Request timeout" are automatically retriggered
  • AC2: Jobs running into "408 Request timeout" internally retry for multiple minutes, e.g. 40 minutes downtime of a central webUI instance

Further details

entrance level issue


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #100859: investigate how to optimize /srv data utilization on OSD size:SResolvedmkittler2021-10-12

Actions
Actions #1

Updated by okurz over 2 years ago

  • Related to action #100859: investigate how to optimize /srv data utilization on OSD size:S added
Actions #2

Updated by okurz almost 2 years ago

  • Tags set to entrance level, easy, beginner
  • Subject changed from Retry if webUI times out connection yielding 408 response "Request timeout" to [easy][beginner] Retry if webUI times out connection yielding 408 response "Request timeout"
  • Description updated (diff)
Actions

Also available in: Atom PDF