Project

General

Profile

action #93964

salt-states CI pipeline deploy step fails on some workers with "Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy."

Added by okurz about 1 month ago. Updated 10 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2021-06-14
Due date:
2021-07-27
% Done:

0%

Estimated time:

Description

Observation

E.g. see

openqaworker-arm-3.suse.de:
----------
          ID: /var/lib/openqa/share
    Function: mount.mounted
      Result: False
     Comment: Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy.
     Started: 17:15:35.643876
    Duration: 271.875 ms
     Changes:   
              ----------
              umount:
                  Forced unmount and mount because options (noauto) changed
----------

Maybe we need to manually ensure all workers have the options updated and be rebooted? Not even sure what makes the share /var/lib/openqa/share busy as normally we should only use caching over rsync and http from osd, not NFS.

Expected result

  • AC1: Stable deployments also after reboot of multiple worker machines

Related issues

Related to openQA Infrastructure - action #94949: Failed systemd services alert for openqaworker3 var-lib-openqa-share.automountResolved2021-06-30

History

#1 Updated by okurz about 1 month ago

  • Priority changed from High to Normal

#2 Updated by cdywan about 1 month ago

A suggestion with a pipeline link or salt call might have been nice.
Could've suggested retry at least 3 times.

We're estimating it to be S (planning poker).

#3 Updated by okurz 19 days ago

  • Status changed from Workable to New

moving all tickets without size confirmation by the team back to "New". The team should move the tickets back after estimating and agreeing on a consistent size

#4 Updated by okurz 18 days ago

  • Related to action #94949: Failed systemd services alert for openqaworker3 var-lib-openqa-share.automount added

#5 Updated by okurz 11 days ago

  • Status changed from New to In Progress
  • Assignee set to okurz

#6 Updated by okurz 11 days ago

  • Due date set to 2021-07-27
  • Status changed from In Progress to Feedback

Seems like salt detects changes all the time because it compares the content of /etc/fstab with the output of the command "mount". This is explained in https://github.com/saltstack/salt/issues/18630#issuecomment-342486325 for different mount parameters which trigger the same. The option "x-systemd.mount-timeout=30m" is correctly included in /etc/fstab but the mount point entry when calling "mount" does not have it as these special parameters are only read by systemd.

So I found a better approach now with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/526

#7 Updated by okurz 10 days ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF