action #178972
closed[s390x][s390zl13][tools] nfs mount to openqa.suse.de is missing size:S
0%
Description
Observation¶
s390zl13:~ # cat /etc/fstab
/dev/disk/by-path/ccw-0.0.6000-part2 / btrfs defaults 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /var btrfs subvol=/@/var 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /usr/local btrfs subvol=/@/usr/local 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /tmp btrfs subvol=/@/tmp 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /srv btrfs subvol=/@/srv 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /root btrfs subvol=/@/root 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /opt btrfs subvol=/@/opt 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /home btrfs subvol=/@/home 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /boot/grub2/s390x-emu btrfs subvol=/@/boot/grub2/s390x-emu 0 0
/dev/disk/by-path/ccw-0.0.6000-part2 /.snapshots btrfs subvol=/@/.snapshots 0 0
/dev/disk/by-path/ccw-0.0.6000-part1 /boot/zipl ext2 defaults 0 2
/dev/disk/by-path/ccw-0.0.6000-part3 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.6002-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.6003-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.6004-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.6005-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.6006-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.6007-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.6008-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.6009-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.600a-part1 swap swap sw,pri=2 0 0
/dev/disk/by-path/ccw-0.0.600b-part1 swap swap sw,pri=2 0 0
#ZFCP Disk for libvirt images
/dev/mapper/3600507638081855cd80000000000004c-part1 /var/lib/libvirt/images ext4 rw,nobarrier,data=writeback 0 0
#openQA NFS share
openqa.oqa.prg2.suse.org:/var/lib/openqa/share/factory /var/lib/openqa/share/factory nfs ro,nolock 0 0
s390zl13:~ # df |grep /var/lib/openqa/share/factory
s390zl13:~ #
openQA test in scenario sle-15-SP6-Server-DVD-Updates-s390x-mau-bind@s390x-kvm fails in
bootloader_zkvm
Test suite description¶
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml.
Reproducible¶
Fails since (at least) Build 20250316-1 (current job)
Expected result¶
Last good: 20250314-1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Suggestions¶
- Use existing automatic recovery for nfs
- I think the machine does not have the worker-grain, right? Because the symptom looks pretty much the same as I found yesterday on netboot: https://progress.opensuse.org/issues/179032
- Yes, those machines do not have the worker role.
Updated by rfan1 about 1 month ago
- Subject changed from [s390x][s390zl13] nfs mount to openqa.suse.de is missing to [s390x][s390zl13][tools] nfs mount to openqa.suse.de is missing
Updated by okurz about 1 month ago
- Tags set to infra, reactive work, s390x, osd
- Project changed from openQA Tests (public) to openQA Infrastructure (public)
- Category changed from Bugs in existing tests to Regressions/Crashes
- Priority changed from Normal to High
- Target version set to Ready
Updated by mkittler about 1 month ago
- Status changed from New to Feedback
- Assignee set to mkittler
This should work again after running sudo systemctl restart var-lib-openqa-share-factory.mount
on s390zl13.oqa.prg2.suse.org
. I also checked s390zl12.oqa.prg2.suse.org
but it wasn't affected.
Updated by mkittler about 1 month ago
Those were the logs (including the restart I did on Mar 17 11:17:31):
martchus@s390zl13:~> sudo journalctl -fu var-lib-openqa-share-factory.mount
Mar 16 03:34:25 s390zl13 systemd[1]: Mounting /var/lib/openqa/share/factory...
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Mounting timed out. Terminating.
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Mount process exited, code=killed, status=15/TERM
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Failed with result 'timeout'.
Mar 16 03:35:39 s390zl13 systemd[1]: Failed to mount /var/lib/openqa/share/factory.
Mar 17 11:17:31 s390zl13 systemd[1]: Mounting /var/lib/openqa/share/factory...
Mar 17 11:17:31 s390zl13 systemd[1]: Mounted /var/lib/openqa/share/factory.
I guess we can easily improve this by using our automatic recovery on those hosts as well: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1406
Updated by nicksinger about 1 month ago
- Related to action #179032: machine netboot.qe.prg2.suse.org can randomly fail "srv-tftpboot-mnt-openqa.mount"-unit added
Updated by nicksinger about 1 month ago
mkittler wrote in #note-4:
Those were the logs (including the restart I did on Mar 17 11:17:31):
martchus@s390zl13:~> sudo journalctl -fu var-lib-openqa-share-factory.mount Mar 16 03:34:25 s390zl13 systemd[1]: Mounting /var/lib/openqa/share/factory... Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Mounting timed out. Terminating. Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Mount process exited, code=killed, status=15/TERM Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Failed with result 'timeout'. Mar 16 03:35:39 s390zl13 systemd[1]: Failed to mount /var/lib/openqa/share/factory. Mar 17 11:17:31 s390zl13 systemd[1]: Mounting /var/lib/openqa/share/factory... Mar 17 11:17:31 s390zl13 systemd[1]: Mounted /var/lib/openqa/share/factory.
I guess we can easily improve this by using our automatic recovery on those hosts as well: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1406
I think the machine does not have the worker-grain, right? Because the symptom looks pretty much the same as I found yesterday on netboot: https://progress.opensuse.org/issues/179032
Updated by okurz about 1 month ago
- Subject changed from [s390x][s390zl13][tools] nfs mount to openqa.suse.de is missing to [s390x][s390zl13][tools] nfs mount to openqa.suse.de is missing size:S
- Description updated (diff)
Updated by okurz about 1 month ago
- Status changed from Feedback to Workable
- Priority changed from High to Urgent
Multiple alerts from yesterday and today, please consider mitigations like silencing alerts.
Updated by mkittler about 1 month ago
- Status changed from Workable to Resolved
- Priority changed from Urgent to High
Can you be more specific? There were many alerts but I don't think any of them has something to do with this ticket. The only currently firing alert is "backup-vm: partitions usage (%) alert" which is definitely unrelated. Considering the problem was fixed and also https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1406 was merged as additional improvement I'm actually considering this ticket resolved.
Updated by okurz 30 days ago
- Status changed from Resolved to Workable
The salt CI pipelines never succeeded in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1406 and now in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/4001161 I see
unreal6.qe.nue2.suse.org:
----------
ID: wakeup_automount
Function: cmd.run
Name: ls -d /var/lib/openqa/share/.
Result: True
Comment: Command "ls -d /var/lib/openqa/share/." run
Started: 20:34:42.032213
Duration: 319.81 ms
Changes:
----------
pid:
27389
retcode:
0
stderr:
stdout:
/var/lib/openqa/share/.
----------
ID: /var/lib/openqa/share
Function: mount.mounted
Result: False
Comment: mount.nfs: Connection refused for openqa.suse.de:/var/lib/openqa/share on /var/lib/openqa/share
Started: 20:34:42.357876
Duration: 1806440.625 ms
Changes:
probably also other hosts affected
Updated by nicksinger 30 days ago
- Assignee changed from mkittler to nicksinger
taking over to see what needs to be changed here
Updated by nicksinger 30 days ago
- Status changed from In Progress to Feedback
The selected libvirt-role covered too many hosts:
openqa:~ # salt -C 'G@roles:libvirt' test.ping
s390zl12.oqa.prg2.suse.org:
True
s390zl13.oqa.prg2.suse.org:
True
ada.qe.prg2.suse.org:
True
unreal6.qe.nue2.suse.org:
True
worker39.oqa.prg2.suse.org:
True
external_openqa_hypervisor
is what we actually want:
openqa:~ # salt -C 'G@roles:external_openqa_hypervisor' test.ping
s390zl12.oqa.prg2.suse.org:
True
s390zl13.oqa.prg2.suse.org:
True
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1408 to fix that
Updated by nicksinger 30 days ago
- Status changed from Feedback to Workable
- Assignee changed from nicksinger to mkittler
MR merged and effective despite the pipeline still failing (for something else). I manually adjusted /etc/fstab
on zl13 and removed the old mount from there.