Project

General

Profile

Actions

action #154018

closed

[alert] Failed systemd services alert: backup-vm postfix

Added by tinita 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2024-01-22
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://stats.openqa-monitor.qa.suse.de/alerting/grafana/Uk02cifVkz/view?orgId=1
Date: Sun, 21 Jan 2024 03:56:36 +0100

1 firing alert instance
[IMAGE]

GROUPED BY 

  1 firing instances

Firing [stats.openqa-monitor.qa.suse.de]
Failed systemd services alert (except openqa.suse.de)
View alert [stats.openqa-monitor.qa.suse.de]
Values
B0=1 
Labels
alertname
Failed systemd services alert (except openqa.suse.de)
grafana_folder
Salt
rule_uid
Uk02cifVkz
Annotations
message

Check failed systemd services on hosts with systemctl --failed. Hint: Go to parent dashboard https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services to see a list of affected hosts.

Actions #1

Updated by tinita 3 months ago

# journalctl -u postfix.service
Jan 21 03:00:06 backup-vm postfix/pickup[14281]: 51BB95791: uid=0 from=<root>
Jan 21 03:00:06 backup-vm postfix/cleanup[21813]: 51BB95791: message-id=<20240121020006.51BB95791@localhost>
Jan 21 03:00:06 backup-vm postfix/qmgr[1488]: 51BB95791: from=<root@localhost>, size=1028, nrcpt=1 (queue active)
Jan 21 03:00:06 backup-vm postfix/local[21816]: 51BB95791: to=<root@localhost>, orig_to=<root>, relay=local, delay=4.4, delays=4.2/0.15/0/0.04, dsn=2.0.0, status=sent (delivered to mailbox)
Jan 21 03:00:06 backup-vm postfix/qmgr[1488]: 51BB95791: removed
-- Boot b65912db84b2429bb307c29ccc1dd7e6 --
Jan 21 03:36:41 backup-vm systemd[1]: Starting Postfix Mail Transport Agent...
Jan 21 03:38:10 backup-vm systemd[1]: postfix.service: start-pre operation timed out. Terminating.
Jan 21 03:38:48 backup-vm systemd[1]: postfix.service: Failed with result 'timeout'.
Jan 21 03:38:48 backup-vm systemd[1]: Failed to start Postfix Mail Transport Agent.
Actions #2

Updated by tinita 3 months ago

  • Description updated (diff)
Actions #3

Updated by okurz 3 months ago

  • Status changed from New to In Progress
  • Assignee set to okurz
Actions #4

Updated by okurz 3 months ago

  • Status changed from In Progress to Resolved

I restarted the service postfix. Now there are no failed services.

Then I took a look into the system journal around 0330 in the morning, just after boot what the machine is doing and found other potentially related problems:

Jan 21 03:38:45 backup-vm auditd[713]: Error receiving audit netlink packet (No buffer space available)
…
Jan 21 03:38:00 backup-vm systemd[1]: Failed to start The Salt Minion.

Looking at https://monitor.qa.suse.de/d/GDbackup-vm/dashboard-for-backup-vm?orgId=1&from=1705800005138&to=1705814908962 I see that there are no problems with memory but potentially CPU

Using the virt-manager GUI connected to qamaster I now changed the CPU config to use CPU host configuration and bump the vCPU allocation from 1 to 2. I shut down the machine and restarted it and ensured it's properly up and running again. I assume this should resolve the problem. Alert gone. No error messages mentioned in system journal from salt or postfix or anything other obvious.

Actions

Also available in: Atom PDF