Project

General

Profile

Actions

action #179497

closed

[FIRING:1] worker-arm1 (worker-arm1: System load alert openQA worker-arm1 salt system_load_alert_worker-arm1 worker)

Added by nicksinger 8 days ago. Updated 7 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

The load was exceeding our expected limits for 15 minutes, and triggered an according alert. It started around ~20:30 UTC, peaked at 21:00 and seems to normalize around 21:30 again

Acceptance Criteria

  • AC1: No alerts about high load for normal openQA workloads on worker-arm1

Suggestions


Files


Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #164284: [FIRING:1] worker-arm1 (worker-arm1: System load alert openQA worker-arm1 salt system_load_alert_worker-arm1 worker) size:SResolvedlivdywan

Actions
Actions #1

Updated by nicksinger 8 days ago

  • Copied from action #164284: [FIRING:1] worker-arm1 (worker-arm1: System load alert openQA worker-arm1 salt system_load_alert_worker-arm1 worker) size:S added
Actions #2

Updated by nicksinger 8 days ago

  • Status changed from New to Feedback
  • Assignee set to nicksinger
Actions #3

Updated by nicksinger 7 days ago

Changes are effective and several slots are currently paused with a message telling me that the configured threshold of 14 is in place. Load still peaks but not as high as previously:

I removed the alert silence for that host again.

Actions

Also available in: Atom PDF