Project

General

Profile

action #47321

[functional][u][opensuse][sporadic] openSUSE Leap 15.1 fails updates_packagekit_gpk test

Added by msmeissn over 2 years ago. Updated about 1 year ago.

Status:
Rejected
Priority:
High
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 30
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-15.0-DVD-Updates-x86_64-gnome@64bit-2G fails in
updates_packagekit_gpk

on feb 7 the updates_packagekit_gpk started to fail on leap 15.0 tests

It seems there is a GNOME screen and it is no longer getting detected correctly.

Actually updates_packagekit_gpk already failed earlier in before, e.g. from 20 days ago:
https://openqa.opensuse.org/tests/838957#step/updates_packagekit_gpk/3
on 2019-01-26

And we can find even older failures in the same modules but that could be different steps.

Reproducible

Often but not always

Expected result

Last good: 20190207-3 however a lot of tests failed in before in a prerequisite

Last good: 20190124-1

Suggestions

  • Change the assert_screen to increase the timeout to 60 seconds for the second loop, and add a soft failure already there
  • [easy] Apply a solution similar to: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6825
  • Only focus on failures in the initial steps of updates_packagekit_gpk
  • Report product regression bug or fix test regression
  • Bisect test changes as well as product changes with statistical investigation
  • Optional: Increase stability of test module or scenario, e.g. by reordering test module schedule

Further details

Always latest result in this scenario: latest

Workaround

Retrigger failing jobs


Related issues

Blocked by openQA Tests - action #48110: [functional][u][sporadic] test failed in different modules that switch from textmode terminal to graphical terminal - unable to login into the gnome session again but we should not even need to login when selecting the correct ttyResolved2019-01-04

History

#1 Updated by okurz over 2 years ago

  • Subject changed from openSUSE Leap 15.0 started failing updates_packagekit_gpk test to [functional][u] openSUSE Leap 15.0 started failing updates_packagekit_gpk test
  • Category set to Bugs in existing tests
  • Target version set to Milestone 22

msmeissn for easier ticket creation you can use the openQA built-in reporting feature, see https://wiki.microfocus.net/index.php?title=RD-OPS_QA/openQA_review#Workflow_for_SLES.2C_SLED_and_HA_as_an_example for an example with screenshot

#2 Updated by okurz over 2 years ago

  • Subject changed from [functional][u] openSUSE Leap 15.0 started failing updates_packagekit_gpk test to [functional][u][sporadic] openSUSE Leap 15.0 started failing updates_packagekit_gpk test
  • Description updated (diff)
  • Status changed from New to Workable

#3 Updated by jorauch over 2 years ago

For me this looks like a problem with gnome and not with the test module

#4 Updated by jorauch over 2 years ago

  • Priority changed from Urgent to High

After further looking I am pretty the problem is gnome taking too long to show up:

https://openqa.opensuse.org/tests/848696#step/updates_packagekit_gpk/3
https://openqa.opensuse.org/tests/849010#step/updates_packagekit_gpk/3

One time the screen is completely black, one time we have at least the bar on the top
I am not totally sure whether this is a test-performance or a product issue

#5 Updated by okurz over 2 years ago

yes, I agree with you but I don't see how we removed the urgency, or have you?

#6 Updated by okurz over 2 years ago

  • Priority changed from High to Urgent

Setting back to "Urgent" as you haven't actually picked it up.

#7 Updated by szarate over 2 years ago

  • Assignee set to szarate

I'll give it a look through the day.

#8 Updated by szarate over 2 years ago

Let's see how much it happens, 1x worker... hacked the patch test to die after (quicker results :D) http://phobos.suse.de/tests/overview?version=15.0&build=20190214-2&distri=opensuse

#9 Updated by szarate over 2 years ago

  • Status changed from Workable to Feedback
  • Priority changed from Urgent to Normal

It was consistently failing on Feb 7, but seems not after?.

I fail to see the urgency of this ticket at this point, as for what I could check, the tests simply pass on my worker (They weren't matching the needle for the first ones, the rest are chopping through the work without problems, but is not showing the symptoms described in the ticket itself, after a needle update, life finds it's way again...).

Anywho, changing priority as last builds on o3 have been passing without problems. Waiting for the rest of jobs in my instance to finish...

#10 Updated by szarate over 2 years ago

  • Assignee changed from szarate to okurz

So, last job is passing: https://openqa.opensuse.org/tests/latest?version=15.0&arch=x86_64&test=gnome&machine=64bit-2G&distri=opensuse&flavor=DVD-Updates

The rest of jobs on my instance started to fail due to missing repos. In any case, looks like gnome is just slow to show up (Peformance problems? also worker seems busy with 4+ seconds)... after looking at: https://openqa.opensuse.org/tests/854596/file/autoinst-log.txt

Suggestions would be:

  • - Change the assert_screen to increase the timeout to 60 seconds for the second loop, and add a soft failure already there
  • - Actually wait, since this wait_still_screen call looks like a noop for me, sleeping for 5 seconds.

Regarless of any of the suggestions: Log where's the counter. Eases up bug investigation.

#11 Updated by okurz over 2 years ago

  • Description updated (diff)
  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Priority changed from Normal to High
  • Target version changed from Milestone 22 to Milestone 23

szarate wrote:

It was consistently failing on Feb 7, but seems not after?.

It still fails some times as we can see now, e.g. https://openqa.opensuse.org/tests/854596#step/updates_packagekit_gpk/3 from 13h ago so as in the subject: "sporadic" as in "not reproducibly all the time"

I fail to see the urgency of this ticket at this point
[…]
Anywho, changing priority as last builds on o3 have been passing without problems. Waiting for the rest of jobs in my instance to finish...

Thank you for looking into this. And I agree that you removed the urgency by confirming that the jobs do not fail in 100% of the cases and that we have a valid workaround (retrigger failed tests). I have noted that in the description now.

In any case, looks like gnome is just slow to show up (Peformance problems? also worker seems busy with 4+ seconds)

And this is why I bumped the prio up a little bit again to "High" as it could be either a product regression or a recent test regression and bisecting is easier when both product and test code changes are more recent. I have put suggestions in the ticket description to bisect both test and/or product.

Suggestions would be:

  • - Change the assert_screen to increase the timeout to 60 seconds for the second loop, and add a soft failure already there
  • - Actually wait, since this wait_still_screen call looks like a noop for me, sleeping for 5 seconds.

Regarless of any of the suggestions: Log where's the counter. Eases up bug investigation.

Yes, that can help however I see other possibilities as well which I have suggested in the updated description.

So as you asked, updated and back to "Workable" with urgency removed, hence "Urgent"->"High"

#12 Updated by szarate over 2 years ago

  • Description updated (diff)

#13 Updated by szarate over 2 years ago

Added: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6807 as follow up to my comment on #note-10 wrt logging the counter

#14 Updated by okurz over 2 years ago

  • Actually wait, since this wait_still_screen call looks like a noop for me, sleeping for 5 seconds.

It's not a "noop" as the actual sleep is at least 2s but not necessarily much more. On top, it describes in a clearer way the intention than a simple sleep.

#15 Updated by szarate over 2 years ago

okurz wrote:

  • Actually wait, since this wait_still_screen call looks like a noop for me, sleeping for 5 seconds.

It's not a "noop" as the actual sleep is at least 2s but not necessarily much more. On top, it describes in a clearer way the intention than a simple sleep.

Agreed, sounds better than a sleep, so we can bump that to 5 :)

#16 Updated by okurz over 2 years ago

  • Status changed from Workable to Feedback
  • Assignee set to szarate
  • Priority changed from High to Normal
  • Target version changed from Milestone 23 to Milestone 22

I don't think wait_still_screen(5) is a stable approach either. Hence my suggestion "Increase stability of test module or scenario, e.g. by reordering test module schedule"

#17 Updated by okurz over 2 years ago

  • Target version changed from Milestone 22 to Milestone 23

#18 Updated by szarate over 2 years ago

  • Description updated (diff)
  • Assignee changed from szarate to okurz

updated my pr to better mark calls to ensure unlocked desktop. That's all for the time being for me.

Updated the description with suggestions, okurz set back to workable if agreed.

#19 Updated by okurz over 2 years ago

  • Blocked by action #48110: [functional][u][sporadic] test failed in different modules that switch from textmode terminal to graphical terminal - unable to login into the gnome session again but we should not even need to login when selecting the correct tty added

#20 Updated by okurz over 2 years ago

  • Status changed from Feedback to Blocked
  • Target version changed from Milestone 23 to Milestone 25

Unfortunately I doubt any approach similar to https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6825 would help here as we start from a graphical session and that is the problem. I blocked this ticket by #48110 now as I think that one has priority which could involve many more unstable test modules. Only then we should revisit here.

#21 Updated by pcervinka over 2 years ago

Looking on recent Leap 42.3 results, updates_packagekit_gpk has sporadic failures too. Do you think it is the similar reason https://openqa.opensuse.org/tests/873333 ?

#22 Updated by okurz over 2 years ago

  • Assignee changed from okurz to mgriessmeier

Move to new QSF-u PO after I moved to the "tools"-team. I mainly checked the subject line so in individual instances you might not agree to take it over completely into QSF-u. Feel free to discuss with me or reassign to me or someone else in this case. Thanks.

#23 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 25 to Milestone 26

#24 Updated by mgriessmeier about 2 years ago

  • Subject changed from [functional][u][sporadic] openSUSE Leap 15.0 started failing updates_packagekit_gpk test to [functional][u][opensuse][sporadic] openSUSE Leap 15.0 started failing updates_packagekit_gpk test
  • Status changed from Blocked to New
  • Assignee deleted (mgriessmeier)
  • Priority changed from Normal to High
  • Target version changed from Milestone 26 to Milestone 27

to be groomed - still happening

#25 Updated by SLindoMansilla about 2 years ago

  • Subject changed from [functional][u][opensuse][sporadic] openSUSE Leap 15.0 started failing updates_packagekit_gpk test to [qam][opensuse][sporadic] openSUSE Leap 15.0 started failing updates_packagekit_gpk test

The reported problem doesn't happen anymore. The new failure is related to the serial device: https://openqa.opensuse.org/tests/1014115#step/updates_packagekit_gpk/48

#26 Updated by mgriessmeier about 2 years ago

  • Target version changed from Milestone 27 to Milestone 28

#27 Updated by mgriessmeier almost 2 years ago

  • Target version changed from Milestone 28 to Milestone 31

#28 Updated by tjyrinki_suse over 1 year ago

  • Subject changed from [qam][opensuse][sporadic] openSUSE Leap 15.0 started failing updates_packagekit_gpk test to [functional][u][opensuse][sporadic] openSUSE Leap 15.1 fails updates_packagekit_gpk test
  • Start date deleted (2019-02-09)

Leap 15.0 is now EOL, but this still happens sporadically for 15.1 updates_packagekit_gpk and it looks the similar kind of problems like originally, not any different https://openqa.opensuse.org/tests/1283905

This used to be in QSF-u backlog for a long time and the milestones are still being updated accordingly, should it be still there as the problem remains as originally described?

#29 Updated by SLindoMansilla about 1 year ago

  • Status changed from New to Rejected
  • Assignee set to SLindoMansilla
  • Target version changed from Milestone 31 to Milestone 30

Not reproducible as described in this ticket.

The last occurrence of a failing updates_packagekit_gpk is caused by missing keys (notice missing pair single quote) and clicks (needle expects xterm to be closed): https://openqa.opensuse.org/tests/1014115#step/updates_packagekit_gpk/51

Also available in: Atom PDF