Project

General

Profile

action #98541

[qe-core][kernel] Steps in case of s390 failures

Added by pcervinka about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2021-09-13
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

There are from time to time failures on s390 workers, which needs actions to fix. It is usually handled on eng-testing channel, sometimes it is fixed quickly, but sometimes it takes more time (waiting for return from vacation of more experienced person). Sometimes it is not clear who is responsible for s390 workers health in openQA pool.

What should be best process and who is primary responsible for s390 workers?


Related issues

Related to openQA Tests - action #97532: [qe-core][sporadic] s390x jobs are failing to boot auto_review:"error: Cannot set interface flags on 'macvtap.*': Address already in use":retryResolved

History

#1 Updated by MDoucha about 1 month ago

The most common s390x worker error is failure to execute define_and_start() in bootloader_zkvm. But this failure has multiple different causes:

Some happen randomly due to worker overload, others are the result of manual misconfiguration and persist on one or more worker slots until manually fixed.

#2 Updated by pcervinka about 1 month ago

  • Project changed from QE Kernel to qam-qasle-collaboration

#3 Updated by szarate about 1 month ago

  • Related to action #97532: [qe-core][sporadic] s390x jobs are failing to boot auto_review:"error: Cannot set interface flags on 'macvtap.*': Address already in use":retry added

#4 Updated by szarate about 1 month ago

Hi Petr, In any case if you're struggling to figure out the root cause of those problems, you can ping me directly, or mention the issue in the qe-core/eng-testing channels, but as I mentioned during the call.

I suspect that the memory one (if it happens again lmk) could be related to too many jobs running on the same machine.

#5 Updated by okurz about 1 month ago

  • Project changed from qam-qasle-collaboration to openQA Tests
  • Subject changed from Steps in case of s390 failures to [qe-core][kernel] Steps in case of s390 failures
  • Category set to Bugs in existing tests

discussed in weekly QE sync 2021-09-15. szarate already linked the important related ticket #97532 . The above mentioned test modules mention mgriessmeier as maintainer hence I added him as watcher to the ticket. He might be able to help. If not then I see the responsibility on the QE Core team about these s390x particularities. In case of issues which look not specific to the test code of os-autoinst-distri-opensuse then tools team is responsible. All tools team members are expected to be responsive in chat (https://progress.opensuse.org/projects/qa/wiki#Common-tasks-for-team-members) , e.g. #eng-testing of the internal chat, so questions can be raised there. With this I think we can move the ticket out of "qam-qasle-collaboration" into the "openQA Tests" project with according keywords

Also available in: Atom PDF