action #98541
closed
[qe-core][kernel] Steps in case of s390 failures
Added by pcervinka over 3 years ago.
Updated about 2 years ago.
Category:
Bugs in existing tests
Description
There are from time to time failures on s390 workers, which needs actions to fix. It is usually handled on eng-testing channel, sometimes it is fixed quickly, but sometimes it takes more time (waiting for return from vacation of more experienced person). Sometimes it is not clear who is responsible for s390 workers health in openQA pool.
What should be best process and who is primary responsible for s390 workers?
The most common s390x worker error is failure to execute define_and_start()
in bootloader_zkvm
. But this failure has multiple different causes:
Some happen randomly due to worker overload, others are the result of manual misconfiguration and persist on one or more worker slots until manually fixed.
- Project changed from 178 to 175
- Related to action #97532: [qe-core][sporadic] s390x jobs are failing to boot auto_review:"error: Cannot set interface flags on 'macvtap.*': Address already in use":retry added
Hi Petr, In any case if you're struggling to figure out the root cause of those problems, you can ping me directly, or mention the issue in the qe-core/eng-testing channels, but as I mentioned during the call.
I suspect that the memory one (if it happens again lmk) could be related to too many jobs running on the same machine.
- Project changed from 175 to openQA Tests (public)
- Subject changed from Steps in case of s390 failures to [qe-core][kernel] Steps in case of s390 failures
- Category set to Bugs in existing tests
discussed in weekly QE sync 2021-09-15. @szarate already linked the important related ticket #97532 . The above mentioned test modules mention mgriessmeier as maintainer hence I added him as watcher to the ticket. He might be able to help. If not then I see the responsibility on the QE Core team about these s390x particularities. In case of issues which look not specific to the test code of os-autoinst-distri-opensuse then tools team is responsible. All tools team members are expected to be responsive in chat (https://progress.opensuse.org/projects/qa/wiki#Common-tasks-for-team-members) , e.g. #eng-testing of the internal chat, so questions can be raised there. With this I think we can move the ticket out of "qam-qasle-collaboration" into the "openQA Tests" project with according keywords
- Related to action #105049: [qe-core] System cannot boot after installation in s390x in multiple test suites added
This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
- Tags changed from s390, openQA, infrastructure to s390, openQA, infra
- Status changed from New to Resolved
Also available in: Atom
PDF