Project

General

Profile

Actions

action #134846

open

Old NFS share mount is keeping processes stuck and openQA workers seem up but do not work on jobs

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
QA (public, currently private due to #173521) - future
Start date:
2023-08-30
Due date:
% Done:

0%

Estimated time:

Description

Observation

On 2023-08-30 many openQA jobs were not picked up for long on OSD machines due to the machines still being connected to the NFS share from old OSD and eventually got stuck with some processes in "D" state (uninteruptible sleep).

Acceptance criteria

  • AC1: Hosts with stuck processes for long trigger alerts

Suggestions

  • Try to reproduce the problem e.g. by manually making one process stuck in "D"
  • Add an alert triggering on the above condition
Actions #2

Updated by okurz over 1 year ago

  • Target version changed from Ready to future
Actions

Also available in: Atom PDF