action #71185
coordination #39719: [saga][epic] Detect "known failures" and mark jobs as such to make tests more stable, reviewing test results and tracking known issues easier
coordination #62420: [epic] Distinguish all types of incompletes
job incompletes with auto_review:"setup failure: Cache service status error: Premature connection close":retry and does not retry, should we just automatically retry the connection?
0%
Description
Observation¶
https://openqa.suse.de/tests/4663520 is incomplete, reason is "setup failure: Cache service status error: Premature connection close" , the worker log
https://openqa.suse.de/tests/4663520/file/worker-log.txt gives more details:
[2020-09-09T12:48:28.0234 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status [2020-09-09T12:48:28.0344 CEST] [debug] [pid:5715] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP2-aarch64-Installtest.qcow2" to "/var/lib/openqa/pool/3/SLES-15-SP2-aarch64-Installtest.qcow2" [2020-09-09T12:48:33.0422 CEST] [debug] [pid:5715] Updating status so job 4663520 is not considered dead. [2020-09-09T12:48:33.0423 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status [2020-09-09T12:48:33.0515 CEST] [debug] [pid:5715] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP2-Installer-DVD-aarch64-GM-DVD1.iso" to "/var/lib/openqa/pool/3/SLE-15-SP2-Installer-DVD-aarch64-GM-DVD1.iso" [2020-09-09T12:48:38.0603 CEST] [debug] [pid:5715] Updating status so job 4663520 is not considered dead. [2020-09-09T12:48:38.0604 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status [2020-09-09T12:48:38.0716 CEST] [debug] [pid:5715] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP2-aarch64-Installtest-uefi-vars.qcow2" to "/var/lib/openqa/pool/3/SLES-15-SP2-aarch64-Installtest-uefi-vars.qcow2" [2020-09-09T12:48:43.0759 CEST] [debug] [pid:5715] Updating status so job 4663520 is not considered dead. [2020-09-09T12:48:43.0760 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status [2020-09-09T12:48:48.0809 CEST] [debug] [pid:5715] Updating status so job 4663520 is not considered dead. [2020-09-09T12:48:48.0810 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status [2020-09-09T12:48:48.0844 CEST] [error] [pid:5715] Unable to setup job 4663520: Cache service status error: Premature connection close [2020-09-09T12:48:48.0844 CEST] [debug] [pid:5715] Stopping job 4663520 from openqa.suse.de: 04663520-sle-15-SP2-Server-DVD-Incidents-Install-aarch64-Build:15836:openssl-1_1-qam-incidentinstall@aarch64-virtio - reason: setup failure [2020-09-09T12:48:48.0845 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status [2020-09-09T12:48:48.0917 CEST] [info] [pid:14619] Uploading autoinst-log.txt [2020-09-09T12:48:48.0968 CEST] [info] [pid:14619] Uploading worker-log.txt
but then the job incompletes and also is not automatically retriggered. It is unclear to the user what should be done
Acceptance criteria¶
- AC1: "Cache service status error: Premature connection close" is prevented or handled with retries (either within job or by retriggering the complete job)
Suggestions¶
- Look into the cache service implementation if we can have retries in this situation. If not, maybe mark job as incomplete with proper reason and ensure it is automatically retriggered.
History
#2
Updated by kraih 3 months ago
The HTTP request in question (cache service status update) already has 3 retries with a 5 second sleep time for each retry. If this is a common issue we could first try tweaking the defaults a bit. More retries and/or longer sleep time. If it's rare we can probably just ignore the issue.
#5
Updated by okurz 3 months ago
My first try for that: https://github.com/os-autoinst/openQA/pull/3513