action #71185
closedcoordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues
coordination #62420: [epic] Distinguish all types of incompletes
job incompletes with auto_review:"setup failure: Cache service status error: Premature connection close":retry and does not retry, should we just automatically retry the connection?
0%
Description
Observation¶
https://openqa.suse.de/tests/4663520 is incomplete, reason is "setup failure: Cache service status error: Premature connection close" , the worker log
https://openqa.suse.de/tests/4663520/file/worker-log.txt gives more details:
[2020-09-09T12:48:28.0234 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status
[2020-09-09T12:48:28.0344 CEST] [debug] [pid:5715] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP2-aarch64-Installtest.qcow2" to "/var/lib/openqa/pool/3/SLES-15-SP2-aarch64-Installtest.qcow2"
[2020-09-09T12:48:33.0422 CEST] [debug] [pid:5715] Updating status so job 4663520 is not considered dead.
[2020-09-09T12:48:33.0423 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status
[2020-09-09T12:48:33.0515 CEST] [debug] [pid:5715] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP2-Installer-DVD-aarch64-GM-DVD1.iso" to "/var/lib/openqa/pool/3/SLE-15-SP2-Installer-DVD-aarch64-GM-DVD1.iso"
[2020-09-09T12:48:38.0603 CEST] [debug] [pid:5715] Updating status so job 4663520 is not considered dead.
[2020-09-09T12:48:38.0604 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status
[2020-09-09T12:48:38.0716 CEST] [debug] [pid:5715] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP2-aarch64-Installtest-uefi-vars.qcow2" to "/var/lib/openqa/pool/3/SLES-15-SP2-aarch64-Installtest-uefi-vars.qcow2"
[2020-09-09T12:48:43.0759 CEST] [debug] [pid:5715] Updating status so job 4663520 is not considered dead.
[2020-09-09T12:48:43.0760 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status
[2020-09-09T12:48:48.0809 CEST] [debug] [pid:5715] Updating status so job 4663520 is not considered dead.
[2020-09-09T12:48:48.0810 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status
[2020-09-09T12:48:48.0844 CEST] [error] [pid:5715] Unable to setup job 4663520: Cache service status error: Premature connection close
[2020-09-09T12:48:48.0844 CEST] [debug] [pid:5715] Stopping job 4663520 from openqa.suse.de: 04663520-sle-15-SP2-Server-DVD-Incidents-Install-aarch64-Build:15836:openssl-1_1-qam-incidentinstall@aarch64-virtio - reason: setup failure
[2020-09-09T12:48:48.0845 CEST] [debug] [pid:5715] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4663520/status
[2020-09-09T12:48:48.0917 CEST] [info] [pid:14619] Uploading autoinst-log.txt
[2020-09-09T12:48:48.0968 CEST] [info] [pid:14619] Uploading worker-log.txt
but then the job incompletes and also is not automatically retriggered. It is unclear to the user what should be done
Acceptance criteria¶
- AC1: "Cache service status error: Premature connection close" is prevented or handled with retries (either within job or by retriggering the complete job)
Suggestions¶
- Look into the cache service implementation if we can have retries in this situation. If not, maybe mark job as incomplete with proper reason and ensure it is automatically retriggered.
Updated by okurz about 4 years ago
- Tags set to cache, connection, network, worker, auto_review, osd
- Description updated (diff)
- Status changed from New to Workable
- Target version set to Ready
Updated by kraih almost 4 years ago
The HTTP request in question (cache service status update) already has 3 retries with a 5 second sleep time for each retry. If this is a common issue we could first try tweaking the defaults a bit. More retries and/or longer sleep time. If it's rare we can probably just ignore the issue.
Updated by okurz almost 4 years ago
My first try for that: https://github.com/os-autoinst/openQA/pull/3513
Updated by kraih almost 4 years ago
- Assignee deleted (
kraih)
This was supposed to be the next ticket in my queue.
Updated by kraih almost 4 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Updated by okurz almost 4 years ago
- Status changed from In Progress to Feedback
#67855 is impacting me and delaying my work …
Updated by okurz almost 4 years ago
- Status changed from Feedback to Resolved
PR merged. As the problem is not likely to happen that often I will set the ticket to "Resolved" right away. If in the review of "auto-review" we see the ticket still referenced we should reconsider.