action #115793
closed
[qe-core] test fails in update_install on PowerPC size:M
Added by mgrifalconi over 2 years ago.
Updated about 1 year ago.
Category:
Bugs in existing tests
Description
Observation¶
I see very often on this test a failure at the login screen, when the test is supposed to type 'root' and find the password prompt. It's sporadic but it's painful because needs sometimes several restarts to get green
openQA test in scenario sle-15-SP2-Server-DVD-Incidents-Install-ppc64le-qam-incidentinstall@ppc64le fails in
update_install
Test suite description¶
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Incident Installation TEST
MAX_JOB_TIME=9000 due to long texlive update
Reproducible¶
Fails since (at least) Build :24713:libtirpc (current job)
Expected result¶
Last good: :25560:python-Flask-Security-Too (or more recent)
Further details¶
Always latest result in this scenario: latest
Acceptance criteria¶
- AC1: The test does not sporadically fail anymore or the problem has been forwarded to the product maintainers
Suggestions¶
- Report a product bug for ppc64le 15-SP2
When I opened VNC console on this failing test the VNC was not updating, whatever I wrote I didn't see the text, had to close the session and open new one then I could see the text, but again typing was not visible only after reopening the session.
Looks like some VNC bug on ppc64le 15-SP2 started to happen ~one month ago.
- Subject changed from [qe-core] test fails in update_install to [tools] test fails in update_install
Thanks Jozef for the investigation! Moving to the tools team then!
- Has duplicate action #115820: issue with openQA typing characters on ppc64le workers added
- Target version set to Ready
- Subject changed from [tools] test fails in update_install to [tools] test fails in update_install size:M
- Description updated (diff)
- Status changed from New to Workable
When I opened VNC console on this failing test the VNC was not updating, whatever I wrote I didn't see the text, had to close the session and open new one then I could see the text, but again typing was not visible only after reopening the session.
Looks like some VNC bug on ppc64le 15-SP2 started to happen ~one month ago.
When also a manual VNC connection is unreliable than I'm not sure what we can do. The VNC server is provided by QEMU so maybe upgrading/downgrading the QEMU package on the worker would help (although I find it unlikely).
Is this actually specific to PowerPC?
- Due date set to 2022-10-21
- Status changed from Workable to Feedback
- Assignee set to okurz
- Priority changed from High to Normal
Apparently the issue is not that big of a problem considering that nobody (else) answered the above questions so I assume the impact is limited. Lowering priority and picking up the ticket waiting for feedback.
- Subject changed from [tools] test fails in update_install size:M to [tools] test fails in update_install on PowerPC size:M
- Due date deleted (
2022-10-21)
- Status changed from Feedback to Workable
- Assignee deleted (
okurz)
Add my findings:
I can see the system seems hang after booting up, and then strings we typed can't show up.
I checked several failed jobs and I can find "sysrq" messages below:
https://openqa.suse.de/tests/9611313/logfile?filename=serial0.txt
susetest login: [ 280.453786] sysrq: Show State
[ 280.855400] sysrq: Show Blocked State
Is it a product bug?
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: qam-incidentinstall
https://openqa.suse.de/tests/9832009#step/update_install/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 40 days if nothing changes in this ticket.
- Priority changed from Normal to High
rfan1 wrote:
Add my findings:
I can see the system seems hang after booting up, and then strings we typed can't show up.
I checked several failed jobs and I can find "sysrq" messages below:
https://openqa.suse.de/tests/9611313/logfile?filename=serial0.txt
susetest login: [ 280.453786] sysrq: Show State
[ 280.855400] sysrq: Show Blocked State
Is it a product bug?
The "sysrq" message come because a "system request" is conducted by the post_fail_hook of tests to find out if there are background tasks stuck in the system blocking the system. So this message is an expected message after the initial failure happened.
- Subject changed from [tools] test fails in update_install on PowerPC size:M to [qe-core] test fails in update_install on PowerPC size:M
- Target version deleted (
Ready)
We discussed this topic in the SUSE QE Tools topic daily as the team missed the SLO about high tickets. I am sorry that we failed to look into this topic for a complete month. https://openqa.suse.de/tests/9912497#step/update_install/2616 looks like another occurence albeit also already one month old. Judging from this job we conclude that likely with more work going on in those hundreds of test steps in a serial terminal the SUT is made quite busy in the background. When then trying to return to the VNC tty and try to login the system is not responsive enough immediately so won't accept characters. I suggest two points:
- Try to narrow down how the same issue can be identified in jobs
- Improve the test code to wait sufficiently for the system to become responsive again after switching to the root tty
As this is for the test distribution and nothing that openQA or os-autoinst can do better easily by itself assigning to tools team.
I don't know what happened, probably package update on worker fixed the issue.
I could not find this failure on osd anymore.
- Copied to action #123451: [retro] Open questions on how a ticket about update_install on PowerPC was handled size:M added
This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
- Status changed from Workable to Rejected
I would call it rejected now, we have other issues atm.
I don't understand why you "reject" the issue when it's still a valid issue. I understand if you don't plan to look into this in your team. Then what subject line keyword do we use for those tickets? Maybe we need a new keyword [volunteer]
?
Also available in: Atom
PDF