JERiveraMoya wrote:
Jobs found with issues of this kind:
https://openqa.suse.de/tests/8189910 -> failed is https://openqa.suse.de/tests/8189910. Screenshot shows grub failing with "resource temporarily unavailable". What resource ? probably storage, but not sure.
https://openqa.suse.de/tests/8189943 -> failed is https://openqa.suse.de/tests/8179636. Screenshot shows grub failing with "discarding improperly nested partition". Proabable storage issue.
These ones are good examples of insufficient logs. However we have at least some screenshot showing some errors indicating potential storage issues. I relaunched both jobs 5 times to see if it happens again, it did not.
https://openqa.suse.de/tests/8189890
https://openqa.suse.de/tests/8189944
ppc64le, impacted by https://progress.opensuse.org/issues/106257. Core dumps are dumped to serial. The method to extract them: download serial0.txt. Then in vi, delete everything before "begin 644 sh.core.pid_..." and after "end". then type uudecode serial0.txt. you get a file called sh.core.pid_somepid_etc... if you do "tail sh.core.pid_somepid_etc" you should obtain the last command that caused the stall. In both of these case, filesystem creation (one btrfs, the other one being ext4).
Since there is a known storage issue on these workers currently, we cannot draw any conclusion yet until ticket is solved. Then we could simply re-trigger the jobs multiple times.
https://openqa.suse.de/tests/8189885
affected by https://bugzilla.suse.com/show_bug.cgi?id=1196190
https://openqa.suse.de/tests/8189945
https://openqa.suse.de/tests/8189838
Happened only once in the two last ones, also likely affected by https://bugzilla.suse.com/show_bug.cgi?id=1196190.
Some seem to indicate some I/O problems with storage, some (the 3 last ones on x86_64) do not give any information on what happened, I had to catch it with VNC. It does not look like the same thing is occurring in x86_64, on aarch64 and on ppc64le.
Effectively, we really need to improve this module. Not only we do not catch that pop-up, but we cannot either pause on failed needle match. And we don't have logs either.