action #115190
closedcoordination #130072: [epic] Handle openQA adaptions in Yam squad - SLE 15 SP6
[Research 24h] boot process stuck sporadically after install in mru-install-multipath-remote
0%
Description
Observation¶
System fails to boot after install, it was hard to say if was an infra issue, so we tried on a different environment to see if it happens again, and it did but then passed the next time.
But then it failed again. In one of the runs, I had blocked state boot at "a start job is starting for dev/mapp..." (then it's truncated) and I couldn't ssh there. See attachement.
We need to figure out which update triggers it. There are several updates I have on my highly suspect list (in order of suspicion) :
libmount and others : https://download.suse.de/ibs/SUSE:/Maintenance:/25242/SUSE_Updates_SLE-SERVER_12-SP3-LTSS-TERADATA_x86_64/x86_64/
systemd: https://download.suse.de/ibs/SUSE:/Maintenance:/24472/SUSE_Updates_SLE-SERVER_12-SP3-LTSS-TERADATA_x86_64/
kernel-default: https://download.suse.de/ibs/SUSE:/Maintenance:/25294/SUSE_Updates_SLE-SERVER_12-SP3-LTSS-TERADATA_x86_64/x86_64/
openiscsi: https://download.suse.de/ibs/SUSE:/Maintenance:/25336/SUSE_Updates_SLE-SERVER_12-SP3-LTSS-TERADATA_x86_64/x86_64/
The other updates are very unlikely to provoke this kind of problem.
We need to try running the jobs several times (as it is sporadic) with those updates excluded separately, then report a bug once faulty update is identified.
openQA test in scenario sle-12-SP3-Server-DVD-Updates-x86_64-mru-install-multipath-remote@64bit fails in
boot_to_desktop
Test suite description¶
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. NETWORK_INIT_PARAM=ifcfg=eth0=dhcp4 to skip dhcp-question which isproblematic on 12sp1
Reproducible¶
Fails since (at least) Build 20220809-1
Expected result¶
Last good: 20220808-1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Files
Updated by JRivrain over 2 years ago
- Project changed from openQA Tests (public) to qe-yam
- Category deleted (
Bugs in existing tests) - Priority changed from Normal to Urgent
Updated by JERiveraMoya over 2 years ago
- Tags set to YaST
- Subject changed from [QAM-Yast] boot process stuck sporadically after install in mru-install-multipath-remote to boot process stuck sporadically after install in mru-install-multipath-remote
- Priority changed from Urgent to Low
- Target version set to Current
All fine, thanks for the investigation, I think this test is not suitable to be running in maintenance updates, the more bother us this sporadic issue, the more reasons to move it to development group and investigate why the test suite was created, feature where was required, etc.
Updated by amanzini over 2 years ago
Seems to impact only 12SP3. moved test to MU development group in order to investigate
Updated by coolgw over 2 years ago
for 12sp4 seems new fail happen:
https://openqa.suse.de/tests/9353114#step/boot_to_desktop/11
Updated by JERiveraMoya over 2 years ago
- Target version changed from Current to future
Updated by JERiveraMoya over 1 year ago
- Tags deleted (
YaST) - Subject changed from boot process stuck sporadically after install in mru-install-multipath-remote to [Research 24h] boot process stuck sporadically after install in mru-install-multipath-remote
- Status changed from New to Workable
- Priority changed from Low to Normal
we could try to revisit this and investigate it, but this scenario is not suitable for running daily in maintenance so it got deactivated a long time ago, let's refresh what we know about the failure.
https://openqa.suse.de/tests/overview?arch=&flavor=&machine=&test=mru-install-multipath-remote_dev&modules=&module_re=&distri=sle&version=&build=20230720-1&groupid=446#
Updated by JERiveraMoya over 1 year ago
- Target version set to Current
- Parent task set to #130072
Updated by syrianidou_sofia about 1 year ago
- Status changed from Workable to In Progress
- Assignee set to syrianidou_sofia
Updated by syrianidou_sofia about 1 year ago ยท Edited
- Status changed from In Progress to Resolved
Both mentioned sles versions where the issue was found are currently out of support. I checked the tests for sles 12 sp5 but the particular test is in development Yast maintenance updates job group and is not showing any relevant issue. Last builds failed due to scc timeout. Previous builds were passing green: latest build