Project

General

Profile

action #96007

OpenQA jobs randomly time out during setup phase

Added by MDoucha over 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2021-07-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

OpenQA jobs have been incompleting more than usual in the past few weeks. The incompletes I've seen just today all show the following sequence of messages in worker.log:

[2021-07-24T16:28:45.444 CEST] [debug] started mgmt loop with pid 59928
[2021-07-24T16:28:45.510 CEST] [debug] qemu version detected: 4.2.1
[2021-07-24T16:28:45.512 CEST] [debug] running /usr/bin/chattr -f +C /var/lib/openqa/pool/9/raid
[2021-07-24T18:28:42.557 CEST] [debug] isotovideo received signal TERM
[2021-07-24T18:28:42.558 CEST] [debug] backend got TERM

https://openqa.suse.de/tests/6552459
https://openqa.suse.de/tests/6555414
https://openqa.suse.de/tests/6543695

I'll update this ticket if I find any similar jobs where the last operation before timeout isn't chattr -f +C.


Related issues

Related to openQA Project - action #81828: Jobs run into timeout_exceeded after the 'chattr' call, no output until timeout, auto_review:"(?s)Refusing to save an empty state file to avoid overwriting a useful one.*Result: timeout":retryResolved2021-01-06

History

#1 Updated by okurz over 1 year ago

  • Related to action #81828: Jobs run into timeout_exceeded after the 'chattr' call, no output until timeout, auto_review:"(?s)Refusing to save an empty state file to avoid overwriting a useful one.*Result: timeout":retry added

#2 Updated by okurz over 1 year ago

  • Target version set to Ready

if isotovideo starts it's actually not "setup phase", looks more like https://progress.opensuse.org/issues/81828

#3 Updated by okurz over 1 year ago

  • Status changed from New to Blocked
  • Assignee set to okurz

so far I assume this is actually just the same as #81828 but if you find other cases then please bring them up here and we will see what we can do about such different cases then

#4 Updated by okurz 11 months ago

  • Status changed from Blocked to Resolved

Within #81828 an issue was identified that was fixed with https://github.com/os-autoinst/os-autoinst/pull/1855 . My assumption is that the error reported here is actually fixed by the same fix or that we are effectively talking about the same issue. So calling it resolved. Please report if you still see the error. I couldn't read from the description how I would be able to easily find similar job failures automatically myself.

Also available in: Atom PDF