[kernel][tools][aarch64][ppc64le][s390x] Tests sometimes fail to boot
This seems to be the same problem as action #27570, it's reproducible basically in each kernel run for at least last month, just look at the tests that fail in boot_ltp while looking for grub2 needle.
[Santiago knows about this; I'm opening this issue so that we can track the progress here.]
Update: the same happens for other non-intel architectures (maybe different reason for each architecture)
#1 Updated by szarate over 3 years ago
- Project changed from openQA Tests to openQA Project
- Category set to 132
- Assignee deleted (
There's a PR created by Thomas a while ago, which needs some love, also https://github.com/os-autoinst/os-autoinst/pull/754
With a quick look, I saw that you have two problems:
One is the system just not booting (or taking too long?)... If the purple cursor could be needled and if found increase the wait timeout * 2? (and marked as a workarround): https://openqa.suse.de/tests/1449548
#4 Updated by coolo over 3 years ago
- Project changed from openQA Project to openQA Tests
- Category deleted (
- Assignee set to metan
the test never boots - and to avoid timeouts, disable the GRUB timeout during install. the thunderX machines are just too slow to match 15 needles in 6 seconds - and test writers are just incapable to make the grub needs more specific than that :(
#6 Updated by rpalethorpe over 3 years ago
- Status changed from New to In Progress
There is still some issue with script_output which I wrote about in Oliver's ticket, but it looks quite rare and there is probably some simple fix for it.
As for the other boot failures they look a bit more serious. When it boots successfully the purple cursor only shows for a few frames in the video, but when it fails it obviously is stuck for much longer. Also if you look in serial0.txt there is a message from either the Kernel, Grub or firmware saying:
Synchronous Exception at 0x00000000785111C4
It looks like it is the firmware saying this and it is possibly caused by Grub or the firmware.
#11 Updated by rpalethorpe over 3 years ago
Can someone please check that the workers are up to date and the correct firmware is being used? We should be getting more information than what is in the serial file: https://bugzilla.opensuse.org/show_bug.cgi?id=1085061#c11
#16 Updated by rpalethorpe over 3 years ago
- Status changed from Blocked to Workable
Why the name change? We have identified that this is ARM specific. So you should probably create some other tickets.
szarate, maybe I can do it at the same time as testing the QEMU refactor. Or you could just delete whatever you put on the workers after SLE15 is released. Whatever comes first.