Project

General

Profile

Actions

action #130790

closed

[alert] failed systemd alert openqa-staging-2 velociraptor-client size:M

Added by okurz 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-06-13
Due date:
2023-07-01
% Done:

0%

Estimated time:

Description

Observation

Received grafana alert. As the machine openqa-staging-2 is not currently in production I called sudo systemctl disable --now telegraf which shortly remedied the situation but then the alert re-appeared. I checked and found telegraf back up again despite the service being masked? I triggered a reboot. Will monitor.

Acceptance criteria

  • AC1: No alert should be received for one of our staging instances

Suggestions

  • Research how a systemd service could be masked but also started again. Can we find in logs what started the service?
  • Crosscheck the situation again, eg. check journalctl on telegraf covering the previous period when okurz stupidly declared the ticket as resolved when apparently it wasn't
  • Crosscheck if maybe osd deployment or salt state gitlab CI pipelines still access the system and re-enable due to salt even though the machine is not in the currently accepted salt keys on OSD?!?
  • Next time wait more days if the problem reappears
Actions

Also available in: Atom PDF