action #103771
Retry on rsync errors like "exit code 5" instead of failing the job (which then retriggers)
0%
Description
Observation¶
https://openqa.suse.de/tests/7816422 incompleted with "Reason: cache failure: Failed to rsync tests: exit code 5" during the time I conducted the OSD Leap 15.2->15.3 upgrade (#99198).
https://openqa.suse.de/tests/7816422/logfile?filename=autoinst-log.txt says
[2021-12-09T14:08:29.710481+01:00] [info] [pid:23787] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #11572 sent to Cache Service [2021-12-09T14:08:34.962673+01:00] [info] [pid:23787] Output of rsync: [info] [#11572] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/ [2021-12-09T14:08:34.962875+01:00] [error] [pid:23787] Failed to rsync tests: exit code 5
https://openqa.suse.de/tests/7816422/logfile?filename=worker-log.txt says
SP2-Installer-DVD-x86_64-GM-DVD1.iso" to "/var/lib/openqa/pool/7/SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso" [2021-12-09T14:08:29.710796+01:00] [debug] [pid:23787] Updating status so job 7816422 is not considered dead. [2021-12-09T14:08:29.711357+01:00] [debug] [pid:23787] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7816422/status [2021-12-09T14:08:34.811331+01:00] [debug] [pid:23787] Updating status so job 7816422 is not considered dead. [2021-12-09T14:08:34.812154+01:00] [debug] [pid:23787] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7816422/status [2021-12-09T14:08:34.962989+01:00] [error] [pid:23787] Unable to setup job 7816422: Failed to rsync tests: exit code 5 [2021-12-09T14:08:34.963141+01:00] [debug] [pid:23787] Stopping job 7816422 from openqa.suse.de: 07816422-sle-15-SP2-Server-DVD-Incidents-x86_64-Build:22102:release-notes-sles-mau-filesystem@64bit - reason: setup failure [2021-12-09T14:08:34.963556+01:00] [debug] [pid:23787] REST-API call: POST http://openqa.suse.de/api/v1/jobs/7816422/status [2021-12-09T14:08:35.027993+01:00] [info] [pid:23927] Uploading autoinst-log.txt
Acceptance criteria¶
- AC1: The cache download retries download for a reasonable time to cover unavailability of the cache target in similar cases
Suggestions¶
The man page of rsync explains that exit code 5 means "Error starting client-server protocol". We should ~~either instruct rsync to retry on that (seems to be not a feature of retry) or ~~ put some retry around the rsync call. Maybe use https://metacpan.org/dist/App-rsync-retry/view/script/rsync-retry, currently only in devel:languages:perl:CPAN-A
Related issues
History
#1
Updated by okurz 5 months ago
Not sure if we will use it but created a SR already to add the mentioned perl helper package to devel:languages:perl: https://build.opensuse.org/request/show/937791
#2
Updated by okurz 5 months ago
- Related to action #99198: Upgrade osd webUI host to openSUSE Leap 15.3 size:M added