action #132383
closed
FC Basement OSD hosts not reachable since 2023-07-06 01:50 CEST
Added by okurz 11 months ago.
Updated 10 months ago.
- Status changed from New to Blocked
- Description updated (diff)
It looks like that caused the NFS mount to be totally unresponsive. So the workers got stuck trying to access it during initialization (and thus stayed offline). Manual filesystem commands also get stuck. I could not even unmount the filesystem again. It is exactly the same on all 3 sap workers. I'll have a look on the other machines to see whether they are equally badly affected.
The other hosts (piworker and openqaworker1) were not affected. That's strange but also good.
The NFS mount being stuck was also the reason why zypper was stuck on rpm --root / --dbpath /usr/lib/sysimage/rpm -U --percent --noglob --force --nodeps -- /var/cache/zypp/packages/devel_openQA/x86_64/openQA-common-4.6.1688565452.efc15ea-lp155.5933.1.x86_64.rpm
and thus the zypper lock was stuck as well and thus salt was not able to apply states. It would be nice if this chain of problems leading to other problems was at least shorter…
- Status changed from Blocked to In Progress
SD ticket was resolved, so network is back. Regarding the NFS mount: We tried to fade out the use of that for years for good reasons. I think we should try again, e.g. only provide the mount on some limited workers with a special worker class e.g. "deprecated-nfs".
- Due date set to 2023-07-21
Setting due date based on mean cycle time of SUSE QE Tools
- Due date deleted (
2023-07-21)
- Status changed from In Progress to Resolved
Also available in: Atom
PDF