Project

General

Profile

action #73396

coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

coordination #62420: [epic] Distinguish all types of incompletes

job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry

Added by Xiaojing_liu 8 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2020-10-15
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

The job https://openqa.suse.de/tests/4829255 incompletes, the worker-log.txt shows:

[2020-10-15T02:45:53.0507 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso" to "/var/lib/openqa/pool/2/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso"
[2020-10-15T02:45:58.0564 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:45:58.0565 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:03.0647 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:46:03.0649 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:03.0799 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2" to "/var/lib/openqa/pool/2/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2"
[2020-10-15T02:46:08.0839 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:46:08.0840 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:08.0985 UTC] [error] [pid:23275] Unable to setup job 4829255: Failed to rsync tests: exit code 23
[2020-10-15T02:46:08.0985 UTC] [debug] [pid:23275] Stopping job 4829255 from openqa.suse.de: 04829255-sle-15-Server-DVD-HPC-Incidents-aarch64-Build:16737:php7-hpc_pdsh_genders_supportserver@aarch64 - reason: setup failure
[2020-10-15T02:46:08.0986 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status

whereas autoinst-log.txt shows:

[2020-10-15T02:46:03.0809 UTC] [info] [pid:23275] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #3031 sent to Cache Service
[2020-10-15T02:46:08.0984 UTC] [info] [pid:23275] Output of rsync:
[info] [#3031] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
receiving incremental file list
sle/.git/
sle/.git/FETCH_HEAD

              0   0%    0.00kB/s    0:00:00  
          2,517 100%    2.40MB/s    0:00:00 (xfr#1, ir-chk=1013/1051)
sle/products/opensuse/needles/.git/
sle/products/opensuse/needles/.git/FETCH_HEAD

              0 100%    0.00kB/s    0:00:00 (xfr#2, ir-chk=1015/23335)
sle/products/sle/needles/.git/
sle/products/sle/needles/.git/FETCH_HEAD

              0   0%    0.00kB/s    0:00:00  
          1,923 100%    1.34kB/s    0:00:01 (xfr#3, ir-chk=1013/50606)
sle/products/sle/needles/.git/FETCH_HEAD

          1,923 100%    1.83MB/s    0:00:00 (xfr#4, ir-chk=1013/50606)
vmdp/.git/
vmdp/.git/FETCH_HEAD

              0   0%    0.00kB/s    0:00:00  
            117 100%  114.26kB/s    0:00:00 (xfr#5, ir-chk=1037/58926)

sent 1,962 bytes  received 2,388,517 bytes  956,191.60 bytes/sec
total size is 17,655,856,197  speedup is 7,385.91

[2020-10-15T02:46:09.0077 UTC] [info] [pid:23275] +++ worker notes +++
[2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] End time: 2020-10-15 02:46:09
[2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] Result: setup failure
[2020-10-15T02:46:09.0102 UTC] [info] [pid:8944] Uploading autoinst-log.txt

Acceptance criteria

  • AC1: Make sure rsync output and error log line end up in the same log file, not separated in two different ones
  • AC2: Solve problems automatically which can be solved automatically, e.g. retry on communication problems

Suggestions

  • the man page of rsync states for exit codes 23 and 24:
23     Partial transfer due to error
24     Partial transfer due to vanished source files

"23" sounds like worth to retry a couple of times or find out what the real error is, "24" sounds like worth to ignore (if we do not already do that) as source files could easily vanish while we sync and we should just ignore that.

Workaround

Retry


Related issues

Related to openQA Project - action #69553: job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedbackResolved2020-08-04

Related to openQA Project - action #73375: Job incompletes with reason auto_review:"(?m)api failure$" (and no further details)Workable2020-10-14

History

#1 Updated by okurz 8 months ago

  • Tags set to worker, cache, rsync, minion
  • Subject changed from job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23" to job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry
  • Description updated (diff)
  • Category set to Concrete Bugs
  • Status changed from New to Workable
  • Priority changed from Normal to High
  • Target version set to Ready

#2 Updated by okurz 8 months ago

  • Related to action #69553: job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedback added

#3 Updated by Xiaojing_liu 8 months ago

  • Status changed from Workable to In Progress
  • Assignee set to Xiaojing_liu

#4 Updated by okurz 8 months ago

  • Related to action #73375: Job incompletes with reason auto_review:"(?m)api failure$" (and no further details) added

#5 Updated by Xiaojing_liu 8 months ago

  • Status changed from In Progress to Feedback

Pr has been merged

#6 Updated by okurz 8 months ago

  • Parent task set to #62420

#7 Updated by okurz 7 months ago

bump

#8 Updated by Xiaojing_liu 7 months ago

  • Status changed from Feedback to Resolved

Looks good on production now.

Also available in: Atom PDF