Project

General

Profile

Actions

action #93964

closed

salt-states CI pipeline deploy step fails on some workers with "Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy."

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2021-06-14
Due date:
2021-07-27
% Done:

0%

Estimated time:

Description

Observation

E.g. see

openqaworker-arm-3.suse.de:
----------
          ID: /var/lib/openqa/share
    Function: mount.mounted
      Result: False
     Comment: Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy.
     Started: 17:15:35.643876
    Duration: 271.875 ms
     Changes:   
              ----------
              umount:
                  Forced unmount and mount because options (noauto) changed
----------

Maybe we need to manually ensure all workers have the options updated and be rebooted? Not even sure what makes the share /var/lib/openqa/share busy as normally we should only use caching over rsync and http from osd, not NFS.

Expected result

  • AC1: Stable deployments also after reboot of multiple worker machines

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #94949: Failed systemd services alert for openqaworker3 var-lib-openqa-share.automountResolvedokurz2021-06-30

Actions
Related to openQA Infrastructure (public) - action #163097: Share mount not working on openqaworker-arm-1 and other workers size:MResolvedmkittler

Actions
Actions #1

Updated by okurz over 3 years ago

  • Priority changed from High to Normal
Actions #2

Updated by livdywan over 3 years ago

A suggestion with a pipeline link or salt call might have been nice.
Could've suggested retry at least 3 times.

We're estimating it to be S (planning poker).

Actions #3

Updated by okurz over 3 years ago

  • Status changed from Workable to New

moving all tickets without size confirmation by the team back to "New". The team should move the tickets back after estimating and agreeing on a consistent size

Actions #4

Updated by okurz over 3 years ago

  • Related to action #94949: Failed systemd services alert for openqaworker3 var-lib-openqa-share.automount added
Actions #5

Updated by okurz over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz
Actions #6

Updated by okurz over 3 years ago

  • Due date set to 2021-07-27
  • Status changed from In Progress to Feedback

Seems like salt detects changes all the time because it compares the content of /etc/fstab with the output of the command "mount". This is explained in https://github.com/saltstack/salt/issues/18630#issuecomment-342486325 for different mount parameters which trigger the same. The option "x-systemd.mount-timeout=30m" is correctly included in /etc/fstab but the mount point entry when calling "mount" does not have it as these special parameters are only read by systemd.

So I found a better approach now with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/526

Actions #7

Updated by okurz over 3 years ago

  • Status changed from Feedback to Resolved
Actions #8

Updated by okurz 7 months ago

  • Related to action #163097: Share mount not working on openqaworker-arm-1 and other workers size:M added
Actions

Also available in: Atom PDF