Project

General

Profile

action #73396

Updated by okurz over 3 years ago

## Observation 

 The job https://openqa.suse.de/tests/4829255 incompletes, the worker-log.txt shows: log show: 
 ``` 
 [2020-10-15T02:45:53.0507 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso" to "/var/lib/openqa/pool/2/SLE-15-Installer-DVD-aarch64-GM-DVD1.iso" 
 [2020-10-15T02:45:58.0564 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead. 
 [2020-10-15T02:45:58.0565 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status 
 [2020-10-15T02:46:03.0647 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead. 
 [2020-10-15T02:46:03.0649 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status 
 [2020-10-15T02:46:03.0799 UTC] [debug] [pid:23275] Linked asset "/var/lib/openqa/cache/openqa.suse.de/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2" to "/var/lib/openqa/pool/2/openqa_support_server_sles12sp3.aarch64-uefi-vars.qcow2" 
 [2020-10-15T02:46:08.0839 UTC] [debug] [pid:23275] Updating status so job 4829255 is not considered dead. 
 [2020-10-15T02:46:08.0840 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status 
 [2020-10-15T02:46:08.0985 UTC] [error] [pid:23275] Unable to setup job 4829255: Failed to rsync tests: exit code 23 
 [2020-10-15T02:46:08.0985 UTC] [debug] [pid:23275] Stopping job 4829255 from openqa.suse.de: 04829255-sle-15-Server-DVD-HPC-Incidents-aarch64-Build:16737:php7-hpc_pdsh_genders_supportserver@aarch64 - reason: setup failure 
 [2020-10-15T02:46:08.0986 UTC] [debug] [pid:23275] REST-API call: POST http://openqa.suse.de/api/v1/jobs/4829255/status 
 ``` 

 whereas autoinst-log.txt shows: 

 ``` 
 [2020-10-15T02:46:03.0809 UTC] [info] [pid:23275] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #3031 sent to Cache Service 
 [2020-10-15T02:46:08.0984 UTC] [info] [pid:23275] Output of rsync: 
 [info] [#3031] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/ 
 receiving incremental file list 
 sle/.git/ 
 sle/.git/FETCH_HEAD 

               0     0%      0.00kB/s      0:00:00   
           2,517 100%      2.40MB/s      0:00:00 (xfr#1, ir-chk=1013/1051) 
 sle/products/opensuse/needles/.git/ 
 sle/products/opensuse/needles/.git/FETCH_HEAD 

               0 100%      0.00kB/s      0:00:00 (xfr#2, ir-chk=1015/23335) 
 sle/products/sle/needles/.git/ 
 sle/products/sle/needles/.git/FETCH_HEAD 

               0     0%      0.00kB/s      0:00:00   
           1,923 100%      1.34kB/s      0:00:01 (xfr#3, ir-chk=1013/50606) 
 sle/products/sle/needles/.git/FETCH_HEAD 

           1,923 100%      1.83MB/s      0:00:00 (xfr#4, ir-chk=1013/50606) 
 vmdp/.git/ 
 vmdp/.git/FETCH_HEAD 

               0     0%      0.00kB/s      0:00:00   
             117 100%    114.26kB/s      0:00:00 (xfr#5, ir-chk=1037/58926) 

 sent 1,962 bytes    received 2,388,517 bytes    956,191.60 bytes/sec 
 total size is 17,655,856,197    speedup is 7,385.91 

 [2020-10-15T02:46:09.0077 UTC] [info] [pid:23275] +++ worker notes +++ 
 [2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] End time: 2020-10-15 02:46:09 
 [2020-10-15T02:46:09.0078 UTC] [info] [pid:23275] Result: setup failure 
 [2020-10-15T02:46:09.0102 UTC] [info] [pid:8944] Uploading autoinst-log.txt 
 ``` 

 ## Acceptance criteria 
 * **AC1:** Make sure rsync output and error log line end up see more details in the same log file, not separated in two different ones 
 * **AC2:** Solve problems automatically which can be solved automatically, e.g. retry on communication problems 

 ## Suggestions 

 * the man page of rsync states for exit codes 23 and 24: 

 ``` 
 23       Partial transfer due to error 
 24       Partial transfer due to vanished source files 
 ``` 

 "23" sounds like worth to retry a couple of times or find out what the real error is, "24" sounds like worth to ignore (if we do not already do that) as source files could easily vanish while we sync and we should just ignore that. 

 ## Workaround 
 Retry https://openqa.suse.de/tests/4829255/file/worker-log.txt

Back