Actions
action #134846
openOld NFS share mount is keeping processes stuck and openQA workers seem up but do not work on jobs
Start date:
2023-08-30
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
On 2023-08-30 many openQA jobs were not picked up for long on OSD machines due to the machines still being connected to the NFS share from old OSD and eventually got stuck with some processes in "D" state (uninteruptible sleep).
Acceptance criteria¶
- AC1: Hosts with stuck processes for long trigger alerts
Suggestions¶
- Try to reproduce the problem e.g. by manually making one process stuck in "D"
- Add an alert triggering on the above condition
Actions