Project

General

Profile

Actions

action #96007

closed

OpenQA jobs randomly time out during setup phase

Added by MDoucha about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-07-26
Due date:
% Done:

0%

Estimated time:

Description

OpenQA jobs have been incompleting more than usual in the past few weeks. The incompletes I've seen just today all show the following sequence of messages in worker.log:

[2021-07-24T16:28:45.444 CEST] [debug] started mgmt loop with pid 59928
[2021-07-24T16:28:45.510 CEST] [debug] qemu version detected: 4.2.1
[2021-07-24T16:28:45.512 CEST] [debug] running /usr/bin/chattr -f +C /var/lib/openqa/pool/9/raid
[2021-07-24T18:28:42.557 CEST] [debug] isotovideo received signal TERM
[2021-07-24T18:28:42.558 CEST] [debug] backend got TERM

https://openqa.suse.de/tests/6552459
https://openqa.suse.de/tests/6555414
https://openqa.suse.de/tests/6543695

I'll update this ticket if I find any similar jobs where the last operation before timeout isn't chattr -f +C.


Related issues 1 (0 open1 closed)

Related to openQA Project - action #81828: Jobs run into timeout_exceeded after the 'chattr' call, no output until timeout, auto_review:"(?s)Refusing to save an empty state file to avoid overwriting a useful one.*Result: timeout":retryResolvedokurz2021-01-06

Actions
Actions #1

Updated by okurz about 3 years ago

  • Related to action #81828: Jobs run into timeout_exceeded after the 'chattr' call, no output until timeout, auto_review:"(?s)Refusing to save an empty state file to avoid overwriting a useful one.*Result: timeout":retry added
Actions #2

Updated by okurz about 3 years ago

  • Target version set to Ready

if isotovideo starts it's actually not "setup phase", looks more like https://progress.opensuse.org/issues/81828

Actions #3

Updated by okurz about 3 years ago

  • Status changed from New to Blocked
  • Assignee set to okurz

so far I assume this is actually just the same as #81828 but if you find other cases then please bring them up here and we will see what we can do about such different cases then

Actions #4

Updated by okurz over 2 years ago

  • Status changed from Blocked to Resolved

Within #81828 an issue was identified that was fixed with https://github.com/os-autoinst/os-autoinst/pull/1855 . My assumption is that the error reported here is actually fixed by the same fix or that we are effectively talking about the same issue. So calling it resolved. Please report if you still see the error. I couldn't read from the description how I would be able to easily find similar job failures automatically myself.

Actions

Also available in: Atom PDF