Project

General

Profile

Actions

action #73396

closed

coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

coordination #62420: [epic] Distinguish all types of incompletes

job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry

Added by Xiaojing_liu about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2020-10-15
Due date:
% Done:

0%

Estimated time:

Description

Observation

The job https://openqa.suse.de/tests/4829255 incompletes, the worker-log.txt shows:

[2020-10-15T02:45:53.0507 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso" to "/var/lib/openqa/pool/2/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso"
[2020-10-15T02:45:58.0564 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:45:58.0565 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:03.0647 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:46:03.0649 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:03.0799 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2" to "/var/lib/openqa/pool/2/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2"
[2020-10-15T02:46:08.0839 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead.
[2020-10-15T02:46:08.0840 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status
[2020-10-15T02:46:08.0985 UTC] [error] [pid:23275] Unable to setup job 4829255: Failed to rsync tests: exit code 23
[2020-10-15T02:46:08.0985 UTC] [debug] [pid:23275] Stopping job 4829255 from openqa.suse.de: 04829255-sle-15-Server-DVD-HPC-Incidents-aarch64-Build:16737:php7-hpc_pdsh_genders_supportserver@aarch64 - reason: setup failure
[2020-10-15T02:46:08.0986 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status

whereas autoinst-log.txt shows:

[2020-10-15T02:46:03.0809 UTC] [info] [pid:23275] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #3031 sent to Cache Service
[2020-10-15T02:46:08.0984 UTC] [info] [pid:23275] Output of rsync:
[info] [#3031] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
receiving incremental file list
sle/.git/
sle/.git/FETCH_HEAD

              0   0%    0.00kB/s    0:00:00  
          2,517 100%    2.40MB/s    0:00:00 (xfr#1, ir-chk=1013/1051)
sle/products/opensuse/needles/.git/
sle/products/opensuse/needles/.git/FETCH_HEAD

              0 100%    0.00kB/s    0:00:00 (xfr#2, ir-chk=1015/23335)
sle/products/sle/needles/.git/
sle/products/sle/needles/.git/FETCH_HEAD

              0   0%    0.00kB/s    0:00:00  
          1,923 100%    1.34kB/s    0:00:01 (xfr#3, ir-chk=1013/50606)
sle/products/sle/needles/.git/FETCH_HEAD

          1,923 100%    1.83MB/s    0:00:00 (xfr#4, ir-chk=1013/50606)
vmdp/.git/
vmdp/.git/FETCH_HEAD

              0   0%    0.00kB/s    0:00:00  
            117 100%  114.26kB/s    0:00:00 (xfr#5, ir-chk=1037/58926)

sent 1,962 bytes  received 2,388,517 bytes  956,191.60 bytes/sec
total size is 17,655,856,197  speedup is 7,385.91

[2020-10-15T02:46:09.0077 UTC] [info] [pid:23275] +++ worker notes +++
[2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] End time: 2020-10-15 02:46:09
[2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] Result: setup failure
[2020-10-15T02:46:09.0102 UTC] [info] [pid:8944] Uploading autoinst-log.txt

Acceptance criteria

  • AC1: Make sure rsync output and error log line end up in the same log file, not separated in two different ones
  • AC2: Solve problems automatically which can be solved automatically, e.g. retry on communication problems

Suggestions

  • the man page of rsync states for exit codes 23 and 24:
23     Partial transfer due to error
24     Partial transfer due to vanished source files

"23" sounds like worth to retry a couple of times or find out what the real error is, "24" sounds like worth to ignore (if we do not already do that) as source files could easily vanish while we sync and we should just ignore that.

Workaround

Retry


Related issues 2 (1 open1 closed)

Related to openQA Project (public) - action #69553: job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedbackResolvedkraih2020-08-04

Actions
Related to openQA Project (public) - action #73375: Job incompletes with reason auto_review:"(?m)api failure$" (and no further details)Workable2020-10-14

Actions
Actions #1

Updated by okurz about 4 years ago

  • Tags set to worker, cache, rsync, minion
  • Subject changed from job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23" to job incompletes with auto_review:"setup failure: Failed to rsync tests: exit code 23":retry
  • Description updated (diff)
  • Category set to Regressions/Crashes
  • Status changed from New to Workable
  • Priority changed from Normal to High
  • Target version set to Ready
Actions #2

Updated by okurz about 4 years ago

  • Related to action #69553: job incompletes with "Failed to rsync tests: exit code 10":retry, improve user feedback added
Actions #3

Updated by Xiaojing_liu about 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to Xiaojing_liu
Actions #4

Updated by okurz about 4 years ago

  • Related to action #73375: Job incompletes with reason auto_review:"(?m)api failure$" (and no further details) added
Actions #5

Updated by Xiaojing_liu about 4 years ago

  • Status changed from In Progress to Feedback

Pr has been merged

Actions #6

Updated by okurz about 4 years ago

  • Parent task set to #62420
Actions #7

Updated by okurz about 4 years ago

bump

Actions #8

Updated by Xiaojing_liu about 4 years ago

  • Status changed from Feedback to Resolved

Looks good on production now.

Actions

Also available in: Atom PDF