action #94949
closedFailed systemd services alert for openqaworker3 var-lib-openqa-share.automount
0%
Description
2021-06-30 14:48:00 openqaworker3 var-lib-openqa-share.automount 1
Logging into the machine shows this:
> systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● var-lib-openqa-share.automount loaded failed failed var-lib-openqa-share.automount
> systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: failed (Result: resources) since Tue 2021-06-29 14:12:38 CEST; 24h ago
Where: /var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Might be related to #94919 in some way? Unfortunately I can't deduce what caused it from the status.
Updated by okurz over 3 years ago
- Related to action #94919: All arm workers down 2021-06-30 , NUE SRV2 Rack A8 was switched off by EngInfra size:S added
Updated by okurz over 3 years ago
- Status changed from New to Blocked
- Assignee set to okurz
- Target version set to Ready
Well, the mount point is related to NFS which is related to network so I would say it can be related for sure :)
Updated by mkittler over 3 years ago
Ah, now you've assigned yourself to it. I was also looking at the problem. It seems that the normal mount and the automount are interfering with each other:
martchus@openqaworker3:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: failed (Result: resources) since Tue 2021-06-29 14:12:38 CEST; 1 day 1h ago
Where: /var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Jun 28 04:04:34 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 32652 (/usr/bin/isotov)
Jun 28 04:04:35 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 32581 (/usr/bin/isotov)
Jun 28 04:04:36 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 32581 (/usr/bin/isotov)
Jun 28 12:04:30 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 31589 (/usr/bin/isotov)
Jun 29 05:04:47 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3055 (/usr/bin/isotov)
Jun 29 05:04:47 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3118 (/usr/bin/isotov)
Jun 29 14:12:38 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got invalid poll event 16 on pipe (fd=152)
Jun 29 14:12:38 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Unit entered failed state.
Jun 30 14:59:15 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Path /var/lib/openqa/share is already a mount point, refusing start.
Jun 30 14:59:15 openqaworker3 systemd[1]: Failed to set up automount var-lib-openqa-share.automount.
martchus@openqaworker3:~> sudo systemctl status var-lib-openqa-share.mount
● var-lib-openqa-share.mount - /var/lib/openqa/share
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: active (mounted) since Wed 2021-06-30 15:31:23 CEST; 6min ago
Where: /var/lib/openqa/share
What: openqa.suse.de:/var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Process: 5015 ExecMount=/usr/bin/mount openqa.suse.de:/var/lib/openqa/share /var/lib/openqa/share -t nfs -o retry=30,ro,x-systemd.automount,x-systemd.mount-timeout=30m (code=exited, status=0/SUCCESS)
Maybe the problem has been introduced by baf1e6dd1f5efb7ce9d4064d9ef841a18fa56064.
Is it really blocked by #94919? This is on openqaworker3 (and not ARM workers) and it doesn't seem to be only a networking issue considering that the normal mount unit works and the NFS mount can be accessed.
Updated by okurz over 3 years ago
- Status changed from Blocked to New
- Assignee changed from okurz to mkittler
@mkittler in that case you can take over of course. I would have merely waited for the network problems to be resolved before checking again
Updated by mkittler over 3 years ago
The problem is not reproducible on other workers, e.g.:
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.mount
● var-lib-openqa-share.mount - /var/lib/openqa/share
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: active (mounted) since Thu 2021-07-01 09:43:55 CEST; 58min ago
Where: /var/lib/openqa/share
What: openqa.suse.de:/var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Process: 23969 ExecMount=/usr/bin/mount openqa.suse.de:/var/lib/openqa/share /var/lib/openqa/share -t nfs -o retry=30,ro,x-systemd.automount,x-systemd.mount-timeout=30m (code=exited, status=0/SUCCESS)
Tasks: 0
CGroup: /system.slice/var-lib-openqa-share.mount
martchus@openqaworker2:~>
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: active (running) since Sun 2021-06-20 03:36:04 CEST; 1 weeks 4 days ago
Where: /var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Jun 24 05:51:42 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 29547 (/usr/bin/isotov)
Jun 24 05:51:43 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 29547 (/usr/bin/isotov)
Jun 24 09:51:40 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 20437 (/usr/bin/isotov)
Jun 24 09:51:40 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 20437 (/usr/bin/isotov)
Jun 24 11:51:33 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 19864 (/usr/bin/isotov)
Jun 24 12:51:34 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3388 (/usr/bin/isotov)
Jun 24 16:27:02 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 21540 (/usr/bin/isotov)
Jun 25 03:41:15 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 27631 (/usr/bin/isotov)
Jun 25 16:41:20 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3927 (/usr/bin/isotov)
Jun 29 14:11:00 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 23038 (worker)
martchus@openqaworker2:~> sudo systemctl restart var-lib-openqa-share.automount
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: active (waiting) since Thu 2021-07-01 12:01:17 CEST; 4s ago
Where: /var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Jul 01 12:01:17 openqaworker2 systemd[1]: Unset automount var-lib-openqa-share.automount.
Jul 01 12:01:17 openqaworker2 systemd[1]: Stopping var-lib-openqa-share.automount.
Jul 01 12:01:17 openqaworker2 systemd[1]: Set up automount var-lib-openqa-share.automount.
martchus@openqaworker2:~> sudo systemctl stop var-lib-openqa-share.automount
martchus@openqaworker2:~> sudo systemctl start var-lib-openqa-share.automount
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: active (waiting) since Thu 2021-07-01 12:01:36 CEST; 2s ago
Where: /var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Jul 01 12:01:36 openqaworker2 systemd[1]: Set up automount var-lib-openqa-share.automount.
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: active (waiting) since Thu 2021-07-01 12:01:36 CEST; 30s ago
Where: /var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Jul 01 12:01:36 openqaworker2 systemd[1]: Set up automount var-lib-openqa-share.automount.
martchus@openqaworker2:~> ls -l /var/lib/openqa/share/factory
insgesamt 3608
drwxr-xr-x 3 rbrown root 393216 1. Jul 11:54 hdd
drwxr-xr-x 3 rbrown root 114688 1. Jul 02:19 iso
drwxr-xr-x 2 rbrown root 44171264 1. Jul 12:00 other
drwxr-xr-x 2666 rbrown root 380928 1. Jul 12:00 repo
drwxrwxrwt 4 root root 4096 1. Jul 12:02 tmp
martchus@openqaworker2:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: active (running) since Thu 2021-07-01 12:01:36 CEST; 58s ago
Where: /var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Jul 01 12:01:36 openqaworker2 systemd[1]: Set up automount var-lib-openqa-share.automount.
Jul 01 12:02:29 openqaworker2 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 16351 (bash)
But on openqaworker3
we get:
martchus@openqaworker3:~> sudo systemctl stop var-lib-openqa-share.automount
martchus@openqaworker3:~> sudo systemctl start var-lib-openqa-share.automount
Job for var-lib-openqa-share.automount failed.
See "systemctl status var-lib-openqa-share.automount" and "journalctl -xe" for details.
martchus@openqaworker3:~> sudo systemctl status var-lib-openqa-share.automount
● var-lib-openqa-share.automount
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: inactive (dead) since Tue 2021-06-29 14:12:38 CEST; 1 day 21h ago
Where: /var/lib/openqa/share
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Jun 29 05:04:47 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3055 (/usr/bin/isotov)
Jun 29 05:04:47 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got automount request for /var/lib/openqa/share, triggered by 3118 (/usr/bin/isotov)
Jun 29 14:12:38 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Got invalid poll event 16 on pipe (fd=152)
Jun 29 14:12:38 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Unit entered failed state.
Jun 30 14:59:15 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Path /var/lib/openqa/share is already a mount point, refusing start.
Jun 30 14:59:15 openqaworker3 systemd[1]: Failed to set up automount var-lib-openqa-share.automount.
Jul 01 11:11:13 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Path /var/lib/openqa/share is already a mount point, refusing start.
Jul 01 11:11:13 openqaworker3 systemd[1]: Failed to set up automount var-lib-openqa-share.automount.
Jul 01 12:04:26 openqaworker3 systemd[1]: var-lib-openqa-share.automount: Path /var/lib/openqa/share is already a mount point, refusing start.
Jul 01 12:04:26 openqaworker3 systemd[1]: Failed to set up automount var-lib-openqa-share.automount.
Looks like it at least nevertheless doesn't enter the failed state again (after I resetted it via sudo systemctl reset-failed
).
The /etc/fstab
entry and the generated systemd units are identical on both hosts.
Updated by okurz over 3 years ago
- Related to action #93964: salt-states CI pipeline deploy step fails on some workers with "Unable to unmount /var/lib/openqa/share: umount.nfs: /var/lib/openqa/share: device is busy." added
Updated by okurz over 3 years ago
- Status changed from New to Resolved
- Assignee changed from mkittler to okurz
#94919 is resolved. I just checked https://monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1 and it's all green. I accept the hypothessis that since the network problems have been fixed and the mounting works again that this was the explanation for the problems we observed.
Updated by okurz 5 months ago
- Related to action #163097: Share mount not working on openqaworker-arm-1 and other workers size:M added