Project

General

Profile

action #94949

Failed systemd services alert for openqaworker3 var-lib-openqa-share.automount

Added by cdywan 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2021-06-30
Due date:
% Done:

0%

Estimated time:

Description

2021-06-30 14:48:00 openqaworker3   var-lib-openqa-share.automount  1

Logging into the machine shows this:

> systemctl --failed
  UNIT                           LOAD   ACTIVE SUB    DESCRIPTION                   
● var-lib-openqa-share.automount loaded failed failed var-lib-openqa-share.automount
> systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: failed (Result: resources) since Tue 2021-06-29 14:12:38 CEST; 24h ago
    Where: /var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Might be related to #94919 in some way? Unfortunately I can't deduce what caused it from the status.


Related issues

Related to openQA Infrastructure - action #94919: All arm workers down 2021-06-30 , NUE SRV2 Rack A8 was switched off by EngInfra size:SResolved2021-06-30

Related to openQA Infrastructure - action #93964: salt-states CI pipeline deploy step fails on some workers with "Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy."Resolved2021-06-142021-07-27

History

#1 Updated by okurz 3 months ago

  • Related to action #94919: All arm workers down 2021-06-30 , NUE SRV2 Rack A8 was switched off by EngInfra size:S added

#2 Updated by okurz 3 months ago

  • Status changed from New to Blocked
  • Assignee set to okurz
  • Target version set to Ready

Well, the mount point is related to NFS which is related to network so I would say it can be related for sure :)

#3 Updated by mkittler 3 months ago

Ah, now you've assigned yourself to it. I was also looking at the problem. It seems that the normal mount and the automount are interfering with each other:

martchus@openqaworker3:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: failed (Result: resources) since Tue 2021-06-29 14:12:38 CEST; 1 day 1h ago
    Where: /var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Jun 28 04:04:34 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 32652 (/usr/bin/isotov)
Jun 28 04:04:35 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 32581 (/usr/bin/isotov)
Jun 28 04:04:36 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 32581 (/usr/bin/isotov)
Jun 28 12:04:30 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 31589 (/usr/bin/isotov)
Jun 29 05:04:47 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3055 (/usr/bin/isotov)
Jun 29 05:04:47 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3118 (/usr/bin/isotov)
Jun 29 14:12:38 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got invalid poll event 16 on pipe (fd=152)
Jun 29 14:12:38 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Unit entered failed state.
Jun 30 14:59:15 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Path /var/lib/openqa/share is already a mount point, refusing start.
Jun 30 14:59:15 openqaworker3 systemd[1]: Failed to set up automount var-lib-openqa-share.automount.
martchus@openqaworker3:~> sudo systemctl status var-lib-openqa-share.mount
● var-lib-openqa-share.mount - /var/lib/openqa/share
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: active (mounted) since Wed 2021-06-30 15:31:23 CEST; 6min ago
    Where: /var/lib/openqa/share
     What: openqa.suse.de:/var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)
  Process: 5015 ExecMount=/usr/bin/mount openqa.suse.de:/var/lib/openqa/share /var/lib/openqa/share -t nfs -o retry=30,ro,x-systemd.automount,x-systemd.mount-timeout=30m (code=exited, status=0/SUCCESS)

Maybe the problem has been introduced by baf1e6dd1f5efb7ce9d4064d9ef841a18fa56064.

Is it really blocked by #94919? This is on openqaworker3 (and not ARM workers) and it doesn't seem to be only a networking issue considering that the normal mount unit works and the NFS mount can be accessed.

#4 Updated by okurz 3 months ago

  • Status changed from Blocked to New
  • Assignee changed from okurz to mkittler

mkittler in that case you can take over of course. I would have merely waited for the network problems to be resolved before checking again

#5 Updated by mkittler 3 months ago

The problem is not reproducible on other workers, e.g.:

martchus@openqaworker2:~>  sudo systemctl status var-lib-openqa-share.mount
● var-lib-openqa-share.mount - /var/lib/openqa/share
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: active (mounted) since Thu 2021-07-01 09:43:55 CEST; 58min ago
    Where: /var/lib/openqa/share
     What: openqa.suse.de:/var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)
  Process: 23969 ExecMount=/usr/bin/mount openqa.suse.de:/var/lib/openqa/share /var/lib/openqa/share -t nfs -o retry=30,ro,x-systemd.automount,x-systemd.mount-timeout=30m (code=exited, status=0/SUCCESS)
    Tasks: 0
   CGroup: /system.slice/var-lib-openqa-share.mount
martchus@openqaworker2:~> 
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: active (running) since Sun 2021-06-20 03:36:04 CEST; 1 weeks 4 days ago
    Where: /var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Jun 24 05:51:42 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 29547 (/usr/bin/isotov)
Jun 24 05:51:43 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 29547 (/usr/bin/isotov)
Jun 24 09:51:40 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 20437 (/usr/bin/isotov)
Jun 24 09:51:40 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 20437 (/usr/bin/isotov)
Jun 24 11:51:33 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 19864 (/usr/bin/isotov)
Jun 24 12:51:34 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3388 (/usr/bin/isotov)
Jun 24 16:27:02 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 21540 (/usr/bin/isotov)
Jun 25 03:41:15 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 27631 (/usr/bin/isotov)
Jun 25 16:41:20 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3927 (/usr/bin/isotov)
Jun 29 14:11:00 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 23038 (worker)
martchus@openqaworker2:~> sudo systemctl restart var-lib-openqa-share.automount
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: active (waiting) since Thu 2021-07-01 12:01:17 CEST; 4s ago
    Where: /var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Jul 01 12:01:17 openqaworker2 systemd[1]: Unset automount var-lib-openqa-share.automount.
Jul 01 12:01:17 openqaworker2 systemd[1]: Stopping var-lib-openqa-share.automount.
Jul 01 12:01:17 openqaworker2 systemd[1]: Set up automount var-lib-openqa-share.automount.
martchus@openqaworker2:~> sudo systemctl stop var-lib-openqa-share.automount
martchus@openqaworker2:~> sudo systemctl start var-lib-openqa-share.automount
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: active (waiting) since Thu 2021-07-01 12:01:36 CEST; 2s ago
    Where: /var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Jul 01 12:01:36 openqaworker2 systemd[1]: Set up automount var-lib-openqa-share.automount.
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: active (waiting) since Thu 2021-07-01 12:01:36 CEST; 30s ago
    Where: /var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Jul 01 12:01:36 openqaworker2 systemd[1]: Set up automount var-lib-openqa-share.automount.
martchus@openqaworker2:~> ls -l /var/lib/openqa/share/factory
insgesamt 3608
drwxr-xr-x    3 rbrown root   393216  1. Jul 11:54 hdd
drwxr-xr-x    3 rbrown root   114688  1. Jul 02:19 iso
drwxr-xr-x    2 rbrown root 44171264  1. Jul 12:00 other
drwxr-xr-x 2666 rbrown root   380928  1. Jul 12:00 repo
drwxrwxrwt    4 root   root     4096  1. Jul 12:02 tmp
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: active (running) since Thu 2021-07-01 12:01:36 CEST; 58s ago
    Where: /var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Jul 01 12:01:36 openqaworker2 systemd[1]: Set up automount var-lib-openqa-share.automount.
Jul 01 12:02:29 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 16351 (bash)

But on openqaworker3 we get:

martchus@openqaworker3:~> sudo systemctl stop var-lib-openqa-share.automount
martchus@openqaworker3:~> sudo systemctl start var-lib-openqa-share.automount
Job for var-lib-openqa-share.automount failed.
See "systemctl  status var-lib-openqa-share.automount" and "journalctl  -xe" for details.
martchus@openqaworker3:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
   Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
   Active: inactive (dead) since Tue 2021-06-29 14:12:38 CEST; 1 day 21h ago
    Where: /var/lib/openqa/share
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Jun 29 05:04:47 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3055 (/usr/bin/isotov)
Jun 29 05:04:47 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3118 (/usr/bin/isotov)
Jun 29 14:12:38 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got invalid poll event 16 on pipe (fd=152)
Jun 29 14:12:38 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Unit entered failed state.
Jun 30 14:59:15 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Path /var/lib/openqa/share is already a mount point, refusing start.
Jun 30 14:59:15 openqaworker3 systemd[1]: Failed to set up automount var-lib-openqa-share.automount.
Jul 01 11:11:13 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Path /var/lib/openqa/share is already a mount point, refusing start.
Jul 01 11:11:13 openqaworker3 systemd[1]: Failed to set up automount var-lib-openqa-share.automount.
Jul 01 12:04:26 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Path /var/lib/openqa/share is already a mount point, refusing start.
Jul 01 12:04:26 openqaworker3 systemd[1]: Failed to set up automount var-lib-openqa-share.automount.

Looks like it at least nevertheless doesn't enter the failed state again (after I resetted it via sudo systemctl reset-failed).

The /etc/fstab entry and the generated systemd units are identical on both hosts.

#6 Updated by okurz 3 months ago

  • Related to action #93964: salt-states CI pipeline deploy step fails on some workers with "Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy." added

#7 Updated by okurz 3 months ago

  • Status changed from New to Resolved
  • Assignee changed from mkittler to okurz

#94919 is resolved. I just checked https://monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1 and it's all green. I accept the hypothessis that since the network problems have been fixed and the mounting works again that this was the explanation for the problems we observed.

Also available in: Atom PDF