Project

General

Profile

Actions

action #16380

closed

workers die like flies

Added by coolo about 7 years ago. Updated almost 7 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2017-02-01
Due date:
% Done:

80%

Estimated time:

Description

Most workers on openqa.opensuse.org are dead:

Jan 31 12:46:47 openqaworker4 worker[13393]: can't open /var/lib/openqa/pool/15/testresults/test_order.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 735.
Jan 31 12:46:52 openqaworker4 worker[13393]: [DEBUG] Either there is no job running or we were asked to stop: (1|Reason: no tests scheduled)
Jan 31 12:46:52 openqaworker4 worker[13393]: [INFO] cleaning up 00343867-opensuse-42.3-DVD-x86_64-Build0055-rescue_system@uefi
Jan 31 12:46:52 openqaworker4 worker[13393]: [INFO] got job 343869: 00343869-opensuse-42.3-DVD-x86_64-Build0055-textmode-image@64bit
Jan 31 12:46:52 openqaworker4 worker[13393]: [INFO] 6853: WORKING 343869
Jan 31 12:49:35 openqaworker4 worker[13393]: [ERROR] 404 response: Not Found (remaining tries: 0)
Jan 31 12:49:35 openqaworker4 worker[13393]: [ERROR] Job aborted because web UI doesn't accept updates anymore (likely considers this job dead)
Jan 31 12:49:36 openqaworker4 worker[13393]: Mojo::Reactor::Poll: Timer failed: No worker id or webui host set! at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 181.
Jan 31 12:49:45 openqaworker4 worker[13393]: WebUI Mojo::IOLoop=HASH(0x37719a0) is unknown! - Should not happen but happened, exiting! at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 404.
Jan 31 12:49:45 openqaworker4 worker[13393]: [INFO] registering worker with openQA Mojo::IOLoop=HASH(0x37719a0)...


Files

wtf.txt (424 KB) wtf.txt Kernel log from worker host affected by this (jobs 79745 and 79754) around the time workers stopped checking in AdamWill, 2017-03-14 20:12
Actions

Also available in: Atom PDF