Project

General

Profile

action #95299

Updated by okurz almost 3 years ago

## Observation 

 I don't know why this ppc64le jobs timed out, cache service ? 
 Should the regex be more specific or can it be generic for any timeout ? 

 https://openqa.suse.de/tests/6393939 
 https://openqa.suse.de/tests/6393941 
 https://openqa.suse.de/tests/6393945 

 ## Steps to reproduce 

 ~~Find jobs referencing this ticket with the help of 
 https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label , 
 call `openqa-query-for-job-label poo#95299`~~ 

 ``` 
 ssh osd "sudo -u geekotest psql --command=\"select jobs.id,result_dir,t_finished,host,instance from jobs join workers on jobs.assigned_worker_id=workers.id where reason ~ 'timeout: setup exceeded' order by t_finished;\" openqa" 
 ``` 

 ## Expected result 
 A log https://openqa.suse.de/tests/6416847/logfile?filename=autoinst-log.txt shows how it should look: 

 ``` 
 [2021-07-12T10:36:56.0008 CEST] [info] [pid:64283] Download of SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso processed: 
 [info] [#106833] 
 Cache size of "/var/lib/openqa/cache" is 44GiB, with limit 50GiB 
 [info] [#106833] 
 Downloading "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" from "http://openqa.suse.de/tests/6416847/asset/iso/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" 
 [info] [#106833] 
 Content of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" has not changed, updating last use 

 [2021-07-12T10:36:56.0100 CEST] [info] [pid:64283] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #106839 sent to Cache Service 
 [2021-07-12T10:37:01.0168 CEST] [info] [pid:64283] Output of rsync: 
 [info] [#106839] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/ 
 receiving incremental file list 

 sent 1,992 bytes    received 2,581,652 bytes    1,722,429.33 bytes/sec 
 total size is 12,978,495,757    speedup is 5,023.33 

 [2021-07-12T10:37:01.0168 CEST] [info] [pid:64283] Finished to rsync tests 
 [2021-07-12T10:37:01.0172 CEST] [debug] [pid:70271] +++ worker notes +++ 
 ``` 

 where one can see the output from the rsync call. 

 ## Problem 
 Problem seems to be specific to ppc64le workers, maybe only "malbec" now. Maybe specific to test syncing. The cacheservice-minion logs do not mention the test rsync request at all. 

 ## Suggestions 
 * Start with calling `git grep` on the source code of openQA for the log messages that we see in the jobs mentioned above (or see comments below), to see which debug messages should be expected before the timeout 
 * If you identify that there could be helpful log messages, e.g. to be able to distinguish if a request was received by the minion service or not, add it

Back