action #73396
coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues
coordination #62420: [epic] Distinguish all types of incompletes
job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry
Start date:
2020-10-15
Due date:
% Done:
0%
Estimated time:
Difficulty:
Description
Observation¶
The job https://openqa.suse.de/tests/4829255 incompletes, the worker-log.txt shows:
[2020-10-15T02:45:53.0507 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso" to "/var/lib/openqa/pool/2/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso" [2020-10-15T02:45:58.0564 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead. [2020-10-15T02:45:58.0565 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status [2020-10-15T02:46:03.0647 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead. [2020-10-15T02:46:03.0649 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status [2020-10-15T02:46:03.0799 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2" to "/var/lib/openqa/pool/2/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2" [2020-10-15T02:46:08.0839 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead. [2020-10-15T02:46:08.0840 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status [2020-10-15T02:46:08.0985 UTC] [error] [pid:23275] Unable to setup job 4829255: Failed to rsync tests: exit code 23 [2020-10-15T02:46:08.0985 UTC] [debug] [pid:23275] Stopping job 4829255 from openqa.suse.de: 04829255-sle-15-Server-DVD-HPC-Incidents-aarch64-Build:16737:php7-hpc_pdsh_genders_supportserver@aarch64 - reason: setup failure [2020-10-15T02:46:08.0986 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
whereas autoinst-log.txt shows:
[2020-10-15T02:46:03.0809 UTC] [info] [pid:23275] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #3031 sent to Cache Service [2020-10-15T02:46:08.0984 UTC] [info] [pid:23275] Output of rsync: [info] [#3031] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/ receiving incremental file list sle/.git/ sle/.git/FETCH_HEAD 0 0% 0.00kB/s 0:00:00 2,517 100% 2.40MB/s 0:00:00 (xfr#1, ir-chk=1013/1051) sle/products/opensuse/needles/.git/ sle/products/opensuse/needles/.git/FETCH_HEAD 0 100% 0.00kB/s 0:00:00 (xfr#2, ir-chk=1015/23335) sle/products/sle/needles/.git/ sle/products/sle/needles/.git/FETCH_HEAD 0 0% 0.00kB/s 0:00:00 1,923 100% 1.34kB/s 0:00:01 (xfr#3, ir-chk=1013/50606) sle/products/sle/needles/.git/FETCH_HEAD 1,923 100% 1.83MB/s 0:00:00 (xfr#4, ir-chk=1013/50606) vmdp/.git/ vmdp/.git/FETCH_HEAD 0 0% 0.00kB/s 0:00:00 117 100% 114.26kB/s 0:00:00 (xfr#5, ir-chk=1037/58926) sent 1,962 bytes received 2,388,517 bytes 956,191.60 bytes/sec total size is 17,655,856,197 speedup is 7,385.91 [2020-10-15T02:46:09.0077 UTC] [info] [pid:23275] +++ worker notes +++ [2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] End time: 2020-10-15 02:46:09 [2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] Result: setup failure [2020-10-15T02:46:09.0102 UTC] [info] [pid:8944] Uploading autoinst-log.txt
Acceptance criteria¶
- AC1: Make sure rsync output and error log line end up in the same log file, not separated in two different ones
- AC2: Solve problems automatically which can be solved automatically, e.g. retry on communication problems
Suggestions¶
- the man page of rsync states for exit codes 23 and 24:
23 Partial transfer due to error 24 Partial transfer due to vanished source files
"23" sounds like worth to retry a couple of times or find out what the real error is, "24" sounds like worth to ignore (if we do not already do that) as source files could easily vanish while we sync and we should just ignore that.
Workaround¶
Retry
Related issues
History
#1
Updated by okurz over 2 years ago
- Tags set to worker, cache, rsync, minion
- Subject changed from job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23" to job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry
- Description updated (diff)
- Category set to Concrete Bugs
- Status changed from New to Workable
- Priority changed from Normal to High
- Target version set to Ready
#2
Updated by okurz over 2 years ago
- Related to action #69553: job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedback added
#3
Updated by Xiaojing_liu over 2 years ago
- Status changed from Workable to In Progress
- Assignee set to Xiaojing_liu
#4
Updated by okurz over 2 years ago
- Related to action #73375: Job incompletes with reason auto_review:"(?m)api failure$" (and no further details) added
#5
Updated by Xiaojing_liu over 2 years ago
- Status changed from In Progress to Feedback
Pr has been merged
#6
Updated by okurz over 2 years ago
- Parent task set to #62420
#7
Updated by okurz about 2 years ago
bump
#8
Updated by Xiaojing_liu about 2 years ago
- Status changed from Feedback to Resolved
Looks good on production now.