Project

General

Profile

Actions

action #106850

closed

Improve error reporting in perform_installation

Added by JERiveraMoya almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
Start date:
2022-02-15
Due date:
% Done:

0%

Estimated time:

Description

Motivation

After migrating to libui REST API with new test module installation/performing_installation/perform_installation sometimes we cannot see the real error unless checking the video.
Example: https://openqa.suse.de/tests/8162213#step/perform_installation/1
Sometimes there is not video (or even logs): https://openqa.suse.de/tests/8162170#step/perform_installation/1

Acceptance criteria

AC1: Makes the reported error more clear

Suggestions

  • Ensure to take an screenshot after timeout is reached.
  • We might need to check the progress bar with some frequency and consider that the UI dialog with the error is not always the same.
  • Take a look that the logs are retrieved properly when failing
Actions #1

Updated by JERiveraMoya almost 3 years ago

  • Target version set to Current
Actions #2

Updated by JERiveraMoya almost 3 years ago

  • Description updated (diff)
Actions #3

Updated by JERiveraMoya almost 3 years ago

  • Description updated (diff)
Actions #4

Updated by JERiveraMoya almost 3 years ago

  • Subject changed from Improve test module error reporting in perform_installation to Improve error reporting in perform_installation
Actions #5

Updated by JERiveraMoya almost 3 years ago

  • Priority changed from Normal to High
Actions #6

Updated by JERiveraMoya almost 3 years ago

  • Tags deleted (qe-yast-refinement)
  • Status changed from New to Workable
Actions #7

Updated by JERiveraMoya almost 3 years ago

  • Status changed from Workable to In Progress
Actions #8

Updated by JERiveraMoya almost 3 years ago

  • Assignee set to JERiveraMoya
Actions #11

Updated by JRivrain almost 3 years ago

JERiveraMoya wrote:

Jobs found with issues of this kind:

https://openqa.suse.de/tests/8189910 -> failed is https://openqa.suse.de/tests/8189910. Screenshot shows grub failing with "resource temporarily unavailable". What resource ? probably storage, but not sure.
https://openqa.suse.de/tests/8189943 -> failed is https://openqa.suse.de/tests/8179636. Screenshot shows grub failing with "discarding improperly nested partition". Proabable storage issue.

These ones are good examples of insufficient logs. However we have at least some screenshot showing some errors indicating potential storage issues. I relaunched both jobs 5 times to see if it happens again, it did not.

https://openqa.suse.de/tests/8189890
https://openqa.suse.de/tests/8189944

ppc64le, impacted by https://progress.opensuse.org/issues/106257. Core dumps are dumped to serial. The method to extract them: download serial0.txt. Then in vi, delete everything before "begin 644 sh.core.pid_..." and after "end". then type uudecode serial0.txt. you get a file called sh.core.pid_somepid_etc... if you do "tail sh.core.pid_somepid_etc" you should obtain the last command that caused the stall. In both of these case, filesystem creation (one btrfs, the other one being ext4).
Since there is a known storage issue on these workers currently, we cannot draw any conclusion yet until ticket is solved. Then we could simply re-trigger the jobs multiple times.

https://openqa.suse.de/tests/8189885

affected by https://bugzilla.suse.com/show_bug.cgi?id=1196190

https://openqa.suse.de/tests/8189945
https://openqa.suse.de/tests/8189838

Happened only once in the two last ones, also likely affected by https://bugzilla.suse.com/show_bug.cgi?id=1196190.

Some seem to indicate some I/O problems with storage, some (the 3 last ones on x86_64) do not give any information on what happened, I had to catch it with VNC. It does not look like the same thing is occurring in x86_64, on aarch64 and on ppc64le.
Effectively, we really need to improve this module. Not only we do not catch that pop-up, but we cannot either pause on failed needle match. And we don't have logs either.

Actions #13

Updated by JERiveraMoya almost 3 years ago

  • Status changed from In Progress to Feedback
Actions #14

Updated by JERiveraMoya almost 3 years ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF