action #93964
closedsalt-states CI pipeline deploy step fails on some workers with "Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy."
0%
Description
Observation¶
E.g. see
openqaworker-arm-3.suse.de:
----------
ID: /var/lib/openqa/share
Function: mount.mounted
Result: False
Comment: Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy.
Started: 17:15:35.643876
Duration: 271.875 ms
Changes:
----------
umount:
Forced unmount and mount because options (noauto) changed
----------
Maybe we need to manually ensure all workers have the options updated and be rebooted? Not even sure what makes the share /var/lib/openqa/share
busy as normally we should only use caching over rsync and http from osd, not NFS.
Expected result¶
- AC1: Stable deployments also after reboot of multiple worker machines
Updated by livdywan over 3 years ago
A suggestion with a pipeline link or salt call might have been nice.
Could've suggested retry at least 3 times.
We're estimating it to be S (planning poker).
Updated by okurz over 3 years ago
- Status changed from Workable to New
moving all tickets without size confirmation by the team back to "New". The team should move the tickets back after estimating and agreeing on a consistent size
Updated by okurz over 3 years ago
- Related to action #94949: Failed systemd services alert for openqaworker3 var-lib-openqa-share.automount added
Updated by okurz over 3 years ago
- Status changed from New to In Progress
- Assignee set to okurz
Just observed that now again in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/492075#L1872
Updated by okurz over 3 years ago
- Due date set to 2021-07-27
- Status changed from In Progress to Feedback
Seems like salt detects changes all the time because it compares the content of /etc/fstab with the output of the command "mount". This is explained in https://github.com/saltstack/salt/issues/18630#issuecomment-342486325 for different mount parameters which trigger the same. The option "x-systemd.mount-timeout=30m" is correctly included in /etc/fstab but the mount point entry when calling "mount" does not have it as these special parameters are only read by systemd.
So I found a better approach now with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/526
Updated by okurz over 3 years ago
- Status changed from Feedback to Resolved
MR merged. Now checking other pipelines.
Found other failures fixed in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/527
https://gitlab.suse.de/okurz/salt-states-openqa/-/pipelines/169398 passed, https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/493407 passed
Updated by okurz 7 months ago
- Related to action #163097: Share mount not working on openqaworker-arm-1 and other workers size:M added