Project

General

Profile

action #95299

Updated by okurz 3 months ago

## Observation

I don't know why this ppc64le jobs timed out, cache service ?
Should the regex be more specific or can it be generic for any timeout ?

https://openqa.suse.de/tests/6393939
https://openqa.suse.de/tests/6393941
https://openqa.suse.de/tests/6393945

## Steps to reproduce

~~Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call `openqa-query-for-job-label poo#95299`~~

```
ssh osd "sudo -u geekotest psql --command=\"select jobs.id,result_dir,t_finished,host,instance from jobs join workers on jobs.assigned_worker_id=workers.id where reason ~ 'timeout: setup exceeded' order by t_finished;\" openqa"
```

## Expected result
A log https://openqa.suse.de/tests/6416847/logfile?filename=autoinst-log.txt shows how it should look:

```
[2021-07-12T10:36:56.0008 CEST] [info] [pid:64283] Download of SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso processed:
[info] [#106833]
Cache size of "/var/lib/openqa/cache" is 44GiB, with limit 50GiB
[info] [#106833]
Downloading "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" from "http://openqa.suse.de/tests/6416847/asset/iso/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso"
[info] [#106833]
Content of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" has not changed, updating last use

[2021-07-12T10:36:56.0100 CEST] [info] [pid:64283] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #106839 sent to Cache Service
[2021-07-12T10:37:01.0168 CEST] [info] [pid:64283] Output of rsync:
[info] [#106839] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
receiving incremental file list

sent 1,992 bytes received 2,581,652 bytes 1,722,429.33 bytes/sec
total size is 12,978,495,757 speedup is 5,023.33

[2021-07-12T10:37:01.0168 CEST] [info] [pid:64283] Finished to rsync tests
[2021-07-12T10:37:01.0172 CEST] [debug] [pid:70271] +++ worker notes +++
```

where one can see the output from the rsync call.

##
Problem
Problem seems to be specific to ppc64le workers, maybe only "malbec" now. Maybe specific to test syncing. The cacheservice-minion logs do not mention the test rsync request at all.

## Suggestions
* Start with calling `git grep` on the source code of openQA for the log messages that we see in the jobs mentioned above (or see comments below), to see which debug messages should be expected before the timeout
* If you identify that there could be helpful log messages, e.g. to be able to distinguish if a request was received by the minion service or not, add it

Back