action #106850
closedImprove error reporting in perform_installation
0%
Description
Motivation¶
After migrating to libui REST API with new test module installation/performing_installation/perform_installation
sometimes we cannot see the real error unless checking the video.
Example: https://openqa.suse.de/tests/8162213#step/perform_installation/1
Sometimes there is not video (or even logs): https://openqa.suse.de/tests/8162170#step/perform_installation/1
Acceptance criteria¶
AC1: Makes the reported error more clear
Suggestions¶
- Ensure to take an screenshot after timeout is reached.
- We might need to check the progress bar with some frequency and consider that the UI dialog with the error is not always the same.
- Take a look that the logs are retrieved properly when failing
Updated by JERiveraMoya almost 3 years ago
- Subject changed from Improve test module error reporting in perform_installation to Improve error reporting in perform_installation
Updated by JERiveraMoya almost 3 years ago
- Tags deleted (
qe-yast-refinement) - Status changed from New to Workable
Updated by JERiveraMoya almost 3 years ago
- Status changed from Workable to In Progress
Updated by JERiveraMoya almost 3 years ago
Updated by JRivrain almost 3 years ago
JERiveraMoya wrote:
Jobs found with issues of this kind:
https://openqa.suse.de/tests/8189910 -> failed is https://openqa.suse.de/tests/8189910. Screenshot shows grub failing with "resource temporarily unavailable". What resource ? probably storage, but not sure.
https://openqa.suse.de/tests/8189943 -> failed is https://openqa.suse.de/tests/8179636. Screenshot shows grub failing with "discarding improperly nested partition". Proabable storage issue.
These ones are good examples of insufficient logs. However we have at least some screenshot showing some errors indicating potential storage issues. I relaunched both jobs 5 times to see if it happens again, it did not.
https://openqa.suse.de/tests/8189890
https://openqa.suse.de/tests/8189944
ppc64le, impacted by https://progress.opensuse.org/issues/106257. Core dumps are dumped to serial. The method to extract them: download serial0.txt. Then in vi, delete everything before "begin 644 sh.core.pid_..." and after "end". then type uudecode serial0.txt. you get a file called sh.core.pid_somepid_etc... if you do "tail sh.core.pid_somepid_etc" you should obtain the last command that caused the stall. In both of these case, filesystem creation (one btrfs, the other one being ext4).
Since there is a known storage issue on these workers currently, we cannot draw any conclusion yet until ticket is solved. Then we could simply re-trigger the jobs multiple times.
affected by https://bugzilla.suse.com/show_bug.cgi?id=1196190
https://openqa.suse.de/tests/8189945
https://openqa.suse.de/tests/8189838
Happened only once in the two last ones, also likely affected by https://bugzilla.suse.com/show_bug.cgi?id=1196190.
Some seem to indicate some I/O problems with storage, some (the 3 last ones on x86_64) do not give any information on what happened, I had to catch it with VNC. It does not look like the same thing is occurring in x86_64, on aarch64 and on ppc64le.
Effectively, we really need to improve this module. Not only we do not catch that pop-up, but we cannot either pause on failed needle match. And we don't have logs either.
Updated by JERiveraMoya almost 3 years ago
- Status changed from In Progress to Feedback
Updated by JERiveraMoya almost 3 years ago
- Status changed from Feedback to Closed