Project

General

Profile

Actions

action #96557

closed

jobs run into MAX_SETUP_TIME, one hour between 'Downloading' and 'Download processed' and no useful output in between auto_review:"timeout: setup exceeded MAX_SETUP_TIME":retry

Added by okurz almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-08-04
Due date:
2021-08-19
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/6645417 is "timeout_exceeded". log details:

[2021-08-04T10:33:49.0665 CEST] [info] [pid:16838] Downloading SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso, request #5221 sent to Cache Service
[2021-08-04T11:26:20.0708 CEST] [info] [pid:16838] Download of SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso processed:
Asset "SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso" was downloaded by #5196, details are therefore unavailable here
[2021-08-04T11:26:20.0712 CEST] [info] [pid:16838] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #5327 sent to Cache Service
[2021-08-04T11:28:20.0151 CEST] [info] [pid:16838] +++ worker notes +++
[2021-08-04T11:28:20.0151 CEST] [info] [pid:16838] End time: 2021-08-04 09:28:20
[2021-08-04T11:28:20.0151 CEST] [info] [pid:16838] Result: timeout
[2021-08-04T11:28:20.0160 CEST] [info] [pid:23579] Uploading autoinst-log.txt

this was on openqaworker6:15. Related systemd journal logs

  • openqa-worker-cacheservice:
Aug 04 08:31:58 openqaworker6 openqa-workercache-daemon[26298]: [26379] [i] Worker 26379 started
Aug 04 08:31:58 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Creating process id file "/tmp/prefork.pid"
Aug 04 08:31:58 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Worker 26379 stopped
Aug 04 08:33:56 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Stopping worker 26377 gracefully (80 seconds)
Aug 04 08:33:57 openqaworker6 openqa-workercache-daemon[26298]: [26377] [i] Worker 26377 started
Aug 04 08:49:40 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Worker 26377 stopped
Aug 04 08:49:40 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Stopping worker 26376 gracefully (80 seconds)
Aug 04 08:49:42 openqaworker6 openqa-workercache-daemon[26298]: [26376] [i] Worker 26376 started
Aug 04 08:51:05 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Worker 26376 stopped
Aug 04 08:51:05 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Stopping worker 26378 gracefully (80 seconds)
Aug 04 08:51:06 openqaworker6 openqa-workercache-daemon[26298]: [26378] [i] Worker 26378 started
Aug 04 09:55:17 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Worker 26378 stopped
Aug 04 09:55:17 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Stopping worker 1568 gracefully (80 seconds)
Aug 04 09:55:21 openqaworker6 openqa-workercache-daemon[26298]: [1568] [i] Worker 1568 started
Aug 04 09:59:14 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Worker 1568 stopped
Aug 04 09:59:14 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Stopping worker 1389 gracefully (80 seconds)
Aug 04 09:59:15 openqaworker6 openqa-workercache-daemon[26298]: [1389] [i] Worker 1389 started
Aug 04 10:14:16 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Worker 1389 stopped
Aug 04 10:14:16 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Stopping worker 3852 gracefully (80 seconds)
Aug 04 10:14:19 openqaworker6 openqa-workercache-daemon[26298]: [3852] [i] Worker 3852 started
Aug 04 10:19:26 openqaworker6 openqa-workercache-daemon[26298]: [4587] [i] Worker 4587 started
Aug 04 10:19:26 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Worker 3852 stopped
Aug 04 10:19:26 openqaworker6 openqa-workercache-daemon[26298]: [26298] [i] Worker 4587 stopped
  • openqa-worker-cacheservice-minion:
Aug 04 10:29:04 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16882] [i] [#5188] Downloading: "SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1.iso"
Aug 04 10:29:14 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16884] [i] [#5190] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Aug 04 10:29:14 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16884] [i] [#5190] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/>
Aug 04 10:29:39 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16894] [i] [#5192] Downloading: "publiccloud_15sp3_Azure_gen2_BYOS_Updates.qcow2"
Aug 04 10:29:44 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16895] [i] [#5193] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Aug 04 10:29:44 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16895] [i] [#5193] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/>
Aug 04 10:29:54 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16904] [i] [#5195] Downloading: "SLED-15-SP2-x86_64-Installtest.qcow2"
Aug 04 10:32:54 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16788] [i] [#5170] Cache size of "/var/lib/openqa/cache" is 47GiB, with limit 50GiB
Aug 04 10:32:54 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16788] [i] [#5170] Downloading "SLES-15-SP2-x86_64-mru-install-minimal-with-addons-Build20210804-1->
Aug 04 10:32:59 openqaworker6 openqa-worker-cacheservice-minion[26299]: [17198] [i] [#5196] Downloading: "SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso"
Aug 04 10:33:09 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16904] [i] [#5195] Cache size of "/var/lib/openqa/cache" is 49GiB, with limit 50GiB
Aug 04 10:33:09 openqaworker6 openqa-worker-cacheservice-minion[26299]: [16904] [i] [#5195] Downloading "SLED-15-SP2-x86_64-Installtest.qcow2" from "http://openqa.suse.de/t>
Aug 04 10:33:14 openqaworker6 openqa-worker-cacheservice-minion[26299]: [17212] [i] [#5197] Downloading: "SLE-15-SP1-Installer-DVD-x86_64-GM-DVD1.iso"
Aug 04 10:33:19 openqaworker6 openqa-worker-cacheservice-minion[26299]: [17215] [i] [#5198] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
…
Aug 04 10:34:49 openqaworker6 openqa-worker-cacheservice-minion[26299]: [17802] [i] [#5213] Downloading: "sle-15-SP1-x86_64-Build:19472:aws-cli-with-hpc.qcow2
…
Aug 04 11:26:18 openqaworker6 openqa-worker-cacheservice-minion[26299]: [17198] [i] [#5196] Size of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP2-Installer-DVD-x86_64-GM>
Aug 04 11:26:18 openqaworker6 openqa-worker-cacheservice-minion[26299]: [17198] [i] [#5196] Download error 598, waiting 5 seconds for next try (4 remaining)
Aug 04 11:26:18 openqaworker6 openqa-worker-cacheservice-minion[26299]: [17198] [i] [#5196] Downloading "SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso" from "http://openqa.su>
Aug 04 11:26:20 openqaworker6 openqa-worker-cacheservice-minion[26299]: [23427] [i] [#5306] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Aug 04 11:26:20 openqaworker6 openqa-worker-cacheservice-minion[26299]: [23427] [i] [#5306] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/>
…

to me it looks like the same asset is downloaded multiple times, which can happen if cache size is exceeded, and somehow the openQA job sees a response 53 minutes later and there is "waiting 5 seconds for next try (4 remaining)" but then nothing related anymore

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#96557


Related issues 6 (0 open6 closed)

Related to openQA Infrastructure - action #96554: Mitigate on-going disk I/O alerts size:MResolvedmkittler2021-08-04

Actions
Related to openQA Project - action #96623: Let workers declare themselves as broken if asset downloads are piling up size:MResolveddheidler2021-08-06

Actions
Related to openQA Project - action #96684: Abort asset download via the cache service when related job runs into a timeout (or is otherwise cancelled) size:MRejectedmkittler2021-08-09

Actions
Related to openQA Infrastructure - coordination #96447: [epic] Failed systemd services and job age alertsResolvedokurz2021-08-04

Actions
Related to openQA Infrastructure - action #98307: Many jobs in o3 fail with timeout_exceeded on openqaworker1 auto_review:"timeout: setup exceeded MAX_SETUP_TIME":retry size:MResolvedmkittler2021-09-08

Actions
Related to openQA Project - coordination #99579: [epic][retro] Follow-up to "Published QCOW images appear to be uncompressed"Resolvedokurz2021-10-01

Actions
Actions

Also available in: Atom PDF