Project

General

Profile

Actions

action #115793

closed

[qe-core] test fails in update_install on PowerPC size:M

Added by mgrifalconi over 1 year ago. Updated 4 months ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

I see very often on this test a failure at the login screen, when the test is supposed to type 'root' and find the password prompt. It's sporadic but it's painful because needs sometimes several restarts to get green

openQA test in scenario sle-15-SP2-Server-DVD-Incidents-Install-ppc64le-qam-incidentinstall@ppc64le fails in
update_install

Test suite description

Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Incident Installation TEST
MAX_JOB_TIME=9000 due to long texlive update

Reproducible

Fails since (at least) Build :24713:libtirpc (current job)

Expected result

Last good: :25560:python-Flask-Security-Too (or more recent)

Further details

Always latest result in this scenario: latest

Acceptance criteria

  • AC1: The test does not sporadically fail anymore or the problem has been forwarded to the product maintainers

Suggestions

  • Report a product bug for ppc64le 15-SP2

Related issues 2 (0 open2 closed)

Has duplicate openQA Infrastructure - action #115820: issue with openQA typing characters on ppc64le workersRejected2022-08-26

Actions
Copied to openQA Project - action #123451: [retro] Open questions on how a ticket about update_install on PowerPC was handled size:MResolvedlivdywan2023-01-202023-02-07

Actions
Actions #1

Updated by dzedro over 1 year ago

When I opened VNC console on this failing test the VNC was not updating, whatever I wrote I didn't see the text, had to close the session and open new one then I could see the text, but again typing was not visible only after reopening the session.
Looks like some VNC bug on ppc64le 15-SP2 started to happen ~one month ago.

Actions #2

Updated by mgrifalconi over 1 year ago

  • Subject changed from [qe-core] test fails in update_install to [tools] test fails in update_install

Thanks Jozef for the investigation! Moving to the tools team then!

Actions #3

Updated by livdywan over 1 year ago

  • Has duplicate action #115820: issue with openQA typing characters on ppc64le workers added
Actions #4

Updated by livdywan over 1 year ago

  • Target version set to Ready
Actions #5

Updated by livdywan over 1 year ago

  • Subject changed from [tools] test fails in update_install to [tools] test fails in update_install size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by mkittler over 1 year ago

When I opened VNC console on this failing test the VNC was not updating, whatever I wrote I didn't see the text, had to close the session and open new one then I could see the text, but again typing was not visible only after reopening the session.
Looks like some VNC bug on ppc64le 15-SP2 started to happen ~one month ago.

When also a manual VNC connection is unreliable than I'm not sure what we can do. The VNC server is provided by QEMU so maybe upgrading/downgrading the QEMU package on the worker would help (although I find it unlikely).

Is this actually specific to PowerPC?

Actions #7

Updated by okurz over 1 year ago

  • Due date set to 2022-10-21
  • Status changed from Workable to Feedback
  • Assignee set to okurz
  • Priority changed from High to Normal

Apparently the issue is not that big of a problem considering that nobody (else) answered the above questions so I assume the impact is limited. Lowering priority and picking up the ticket waiting for feedback.

Actions #8

Updated by mgrifalconi over 1 year ago

I think I only saw this on powerpc yes. Two more examples: https://openqa.suse.de/tests/9610410 https://openqa.suse.de/tests/9611313

Actions #9

Updated by okurz over 1 year ago

  • Subject changed from [tools] test fails in update_install size:M to [tools] test fails in update_install on PowerPC size:M
  • Due date deleted (2022-10-21)
  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
Actions #10

Updated by rfan1 over 1 year ago

Add my findings:

I can see the system seems hang after booting up, and then strings we typed can't show up.

I checked several failed jobs and I can find "sysrq" messages below:
https://openqa.suse.de/tests/9611313/logfile?filename=serial0.txt


susetest login: [  280.453786] sysrq: Show State
[  280.855400] sysrq: Show Blocked State

Is it a product bug?

Actions #11

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: qam-incidentinstall
https://openqa.suse.de/tests/9832009#step/update_install/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 40 days if nothing changes in this ticket.

Actions #12

Updated by okurz over 1 year ago

  • Priority changed from Normal to High
Actions #13

Updated by okurz about 1 year ago

rfan1 wrote:

Add my findings:

I can see the system seems hang after booting up, and then strings we typed can't show up.

I checked several failed jobs and I can find "sysrq" messages below:
https://openqa.suse.de/tests/9611313/logfile?filename=serial0.txt


susetest login: [  280.453786] sysrq: Show State
[  280.855400] sysrq: Show Blocked State

Is it a product bug?

The "sysrq" message come because a "system request" is conducted by the post_fail_hook of tests to find out if there are background tasks stuck in the system blocking the system. So this message is an expected message after the initial failure happened.

Actions #14

Updated by okurz about 1 year ago

  • Subject changed from [tools] test fails in update_install on PowerPC size:M to [qe-core] test fails in update_install on PowerPC size:M
  • Target version deleted (Ready)

We discussed this topic in the SUSE QE Tools topic daily as the team missed the SLO about high tickets. I am sorry that we failed to look into this topic for a complete month. https://openqa.suse.de/tests/9912497#step/update_install/2616 looks like another occurence albeit also already one month old. Judging from this job we conclude that likely with more work going on in those hundreds of test steps in a serial terminal the SUT is made quite busy in the background. When then trying to return to the VNC tty and try to login the system is not responsive enough immediately so won't accept characters. I suggest two points:

  1. Try to narrow down how the same issue can be identified in jobs
  2. Improve the test code to wait sufficiently for the system to become responsive again after switching to the root tty

As this is for the test distribution and nothing that openQA or os-autoinst can do better easily by itself assigning to tools team.

Actions #15

Updated by dzedro about 1 year ago

I don't know what happened, probably package update on worker fixed the issue.
I could not find this failure on osd anymore.

Actions #16

Updated by livdywan about 1 year ago

  • Copied to action #123451: [retro] Open questions on how a ticket about update_install on PowerPC was handled size:M added
Actions #17

Updated by slo-gin about 1 year ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #18

Updated by szarate 5 months ago

  • Status changed from Workable to Rejected

I would call it rejected now, we have other issues atm.

Actions #19

Updated by okurz 4 months ago

I don't understand why you "reject" the issue when it's still a valid issue. I understand if you don't plan to look into this in your team. Then what subject line keyword do we use for those tickets? Maybe we need a new keyword [volunteer]?

Actions

Also available in: Atom PDF