Project

General

Profile

Actions

action #98541

closed

[qe-core][kernel] Steps in case of s390 failures

Added by pcervinka over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2021-09-13
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

There are from time to time failures on s390 workers, which needs actions to fix. It is usually handled on eng-testing channel, sometimes it is fixed quickly, but sometimes it takes more time (waiting for return from vacation of more experienced person). Sometimes it is not clear who is responsible for s390 workers health in openQA pool.

What should be best process and who is primary responsible for s390 workers?


Related issues 2 (1 open1 closed)

Related to openQA Tests - action #97532: [qe-core][sporadic] s390x jobs are failing to boot auto_review:"error: Cannot set interface flags on 'macvtap.*': Address already in use":retryResolvedszarate

Actions
Related to openQA Tests - action #105049: [qe-core] System cannot boot after installation in s390x in multiple test suitesWorkableszarate

Actions
Actions #1

Updated by MDoucha over 2 years ago

The most common s390x worker error is failure to execute define_and_start() in bootloader_zkvm. But this failure has multiple different causes:

Some happen randomly due to worker overload, others are the result of manual misconfiguration and persist on one or more worker slots until manually fixed.

Actions #2

Updated by pcervinka over 2 years ago

  • Project changed from 178 to 175
Actions #3

Updated by szarate over 2 years ago

  • Related to action #97532: [qe-core][sporadic] s390x jobs are failing to boot auto_review:"error: Cannot set interface flags on 'macvtap.*': Address already in use":retry added
Actions #4

Updated by szarate over 2 years ago

Hi Petr, In any case if you're struggling to figure out the root cause of those problems, you can ping me directly, or mention the issue in the qe-core/eng-testing channels, but as I mentioned during the call.

I suspect that the memory one (if it happens again lmk) could be related to too many jobs running on the same machine.

Actions #5

Updated by okurz over 2 years ago

  • Project changed from 175 to openQA Tests
  • Subject changed from Steps in case of s390 failures to [qe-core][kernel] Steps in case of s390 failures
  • Category set to Bugs in existing tests

discussed in weekly QE sync 2021-09-15. @szarate already linked the important related ticket #97532 . The above mentioned test modules mention mgriessmeier as maintainer hence I added him as watcher to the ticket. He might be able to help. If not then I see the responsibility on the QE Core team about these s390x particularities. In case of issues which look not specific to the test code of os-autoinst-distri-opensuse then tools team is responsible. All tools team members are expected to be responsive in chat (https://progress.opensuse.org/projects/qa/wiki#Common-tasks-for-team-members) , e.g. #eng-testing of the internal chat, so questions can be raised there. With this I think we can move the ticket out of "qam-qasle-collaboration" into the "openQA Tests" project with according keywords

Actions #6

Updated by tjyrinki_suse about 2 years ago

  • Related to action #105049: [qe-core] System cannot boot after installation in s390x in multiple test suites added
Actions #7

Updated by slo-gin over 1 year ago

This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #8

Updated by okurz over 1 year ago

  • Tags changed from s390, openQA, infrastructure to s390, openQA, infra
Actions #9

Updated by pcervinka over 1 year ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF