action #175989: Too big logfiles causing failed systemd services alert: logrotate (monitor, openqaw5-xen, s390zl12) size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #175989

closed

Too big logfiles causing failed systemd services alert: logrotate (monitor, openqaw5-xen, s390zl12) size:S

Added by tinita 3 months ago. Updated 3 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

jbaier_cz

Category:

Regressions/Crashes

Target version:

openQA Project (public) - Ready

Start date:

2025-01-22

Due date:

% Done:

Estimated time:

Tags:

osd, infra, reactive work, logrotate

Description

Observation¶

Date: Wed, 22 Jan 2025 12:55:55 +0100
https://monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?from=now-24h&to=now

2025-01-21 23:59:50 monitor logrotate
2025-01-21 23:59:50 openqaw5-xen logrotate
2025-01-21 23:59:50 s390zl12 logrotate

On monitor:

% journalctl -u logrotate --since yesterday
Jan 21 14:31:47 monitor systemd[1]: Starting Rotate log files...
Jan 21 14:31:51 monitor logrotate[3413]: error: destination /var/log/messages-20250121.xz already exists, skipping rotation
Jan 21 14:31:51 monitor logrotate[3413]: error: destination /var/log/zypper.log-20250121.xz already exists, skipping rotation
Jan 21 14:31:51 monitor systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Jan 21 14:31:51 monitor systemd[1]: logrotate.service: Failed with result 'exit-code'.
Jan 21 14:31:51 monitor systemd[1]: Failed to start Rotate log files.
Jan 22 00:00:01 monitor systemd[1]: Starting Rotate log files...
Jan 22 00:07:49 monitor systemd[1]: logrotate.service: Deactivated successfully.
Jan 22 00:07:49 monitor systemd[1]: Finished Rotate log files.
Jan 22 00:07:49 monitor systemd[1]: logrotate.service: Consumed 7min 9.545s CPU time.

We have a lot of output in zypper.log and messages right now, so we should check why

Acceptance criteria¶

AC1: We have an understanding what triggered the problem on multiple hosts on that date
AC2: The error condition consistently does not trigger an alert

Suggestions¶

Look up existing tickets about logrotate. We already had "destination … already exists" multiple times
Investigate why the log messages became so big and fix the root cause
Consider how the underlying error condition could trigger an alert more directly (instead of indirectly relying on logrotate to "fail")

Related issues 5 (0 open — 5 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #175989

Too big logfiles causing failed systemd services alert: logrotate (monitor, openqaw5-xen, s390zl12) size:S

Observation¶

Acceptance criteria¶

Suggestions¶

Updated by tinita 3 months ago

Updated by tinita 3 months ago

Updated by okurz 3 months ago

Updated by okurz 3 months ago

Updated by okurz 3 months ago

Updated by okurz 3 months ago

Updated by jbaier_cz 3 months ago

Updated by nicksinger 3 months ago

Updated by tinita 3 months ago

Updated by okurz 3 months ago

Updated by dheidler 3 months ago

Updated by dheidler 3 months ago

Updated by jbaier_cz 3 months ago

Updated by jbaier_cz 3 months ago

Updated by okurz 3 months ago

Updated by tinita 3 months ago · Edited

Updated by tinita 3 months ago

Updated by jbaier_cz 3 months ago

Updated by jbaier_cz 3 months ago

Updated by jbaier_cz 3 months ago

Updated by jbaier_cz 3 months ago

Updated by tinita 3 months ago

Updated by tinita 3 months ago

Updated by jbaier_cz 3 months ago

Updated by jbaier_cz 3 months ago

Updated by jbaier_cz 3 months ago