Actions
action #73396
closedcoordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues
coordination #62420: [epic] Distinguish all types of incompletes
job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry
Start date:
2020-10-15
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
The job https://openqa.suse.de/tests/4829255 incompletes, the worker-log.txt shows:
[2020-10-15T02:45:53.0507 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso" to "/var/lib/openqa/pool/2/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso"
[2020-10-15T02:45:58.0564 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:45:58.0565 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:03.0647 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:46:03.0649 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:03.0799 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2" to "/var/lib/openqa/pool/2/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2"
[2020-10-15T02:46:08.0839 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:46:08.0840 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:08.0985 UTC] [error] [pid:23275] Unable to setup job 4829255: Failed to rsync tests: exit code 23
[2020-10-15T02:46:08.0985 UTC] [debug] [pid:23275] Stopping job 4829255 from openqa.suse.de: 04829255-sle-15-Server-DVD-HPC-Incidents-aarch64-Build:16737:php7-hpc_pdsh_genders_supportserver@aarch64 - reason: setup failure
[2020-10-15T02:46:08.0986 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
whereas autoinst-log.txt shows:
[2020-10-15T02:46:03.0809 UTC] [info] [pid:23275] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #3031 sent to Cache Service
[2020-10-15T02:46:08.0984 UTC] [info] [pid:23275] Output of rsync:
[info] [#3031] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
receiving incremental file list
sle/.git/
sle/.git/FETCH_HEAD
0 0% 0.00kB/s 0:00:00
2,517 100% 2.40MB/s 0:00:00 (xfr#1, ir-chk=1013/1051)
sle/products/opensuse/needles/.git/
sle/products/opensuse/needles/.git/FETCH_HEAD
0 100% 0.00kB/s 0:00:00 (xfr#2, ir-chk=1015/23335)
sle/products/sle/needles/.git/
sle/products/sle/needles/.git/FETCH_HEAD
0 0% 0.00kB/s 0:00:00
1,923 100% 1.34kB/s 0:00:01 (xfr#3, ir-chk=1013/50606)
sle/products/sle/needles/.git/FETCH_HEAD
1,923 100% 1.83MB/s 0:00:00 (xfr#4, ir-chk=1013/50606)
vmdp/.git/
vmdp/.git/FETCH_HEAD
0 0% 0.00kB/s 0:00:00
117 100% 114.26kB/s 0:00:00 (xfr#5, ir-chk=1037/58926)
sent 1,962 bytes received 2,388,517 bytes 956,191.60 bytes/sec
total size is 17,655,856,197 speedup is 7,385.91
[2020-10-15T02:46:09.0077 UTC] [info] [pid:23275] +++ worker notes +++
[2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] End time: 2020-10-15 02:46:09
[2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] Result: setup failure
[2020-10-15T02:46:09.0102 UTC] [info] [pid:8944] Uploading autoinst-log.txt
Acceptance criteria¶
- AC1: Make sure rsync output and error log line end up in the same log file, not separated in two different ones
- AC2: Solve problems automatically which can be solved automatically, e.g. retry on communication problems
Suggestions¶
- the man page of rsync states for exit codes 23 and 24:
23 Partial transfer due to error
24 Partial transfer due to vanished source files
"23" sounds like worth to retry a couple of times or find out what the real error is, "24" sounds like worth to ignore (if we do not already do that) as source files could easily vanish while we sync and we should just ignore that.
Workaround¶
Retry
Updated by okurz about 4 years ago
- Tags set to worker, cache, rsync, minion
- Subject changed from job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23" to job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry
- Description updated (diff)
- Category set to Regressions/Crashes
- Status changed from New to Workable
- Priority changed from Normal to High
- Target version set to Ready
Updated by okurz about 4 years ago
- Related to action #69553: job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedback added
Updated by Xiaojing_liu about 4 years ago
- Status changed from Workable to In Progress
- Assignee set to Xiaojing_liu
Updated by okurz about 4 years ago
- Related to action #73375: Job incompletes with reason auto_review:"(?m)api failure$" (and no further details) added
Updated by Xiaojing_liu about 4 years ago
- Status changed from In Progress to Feedback
Pr has been merged
Updated by Xiaojing_liu about 4 years ago
- Status changed from Feedback to Resolved
Looks good on production now.
Actions