action #62306
closed
osd logrotate fails sporadically on "error opening /var/log/salt/master: Permission denied", only at 00:00, i.e. midnight every day.
Added by okurz almost 5 years ago.
Updated over 4 years ago.
Description
> sudo systemctl status logrotate
● logrotate.service - Rotate log files
Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2020-01-19 00:08:34 CET; 22h ago
Docs: man:logrotate(8)
man:logrotate.conf(5)
Process: 5232 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=1/FAILURE)
Main PID: 5232 (code=exited, status=1/FAILURE)
Jan 19 00:00:00 openqa systemd[1]: Starting Rotate log files...
Jan 19 00:08:34 openqa logrotate[5232]: error: error opening /var/log/salt/master: Permission denied
Jan 19 00:08:34 openqa systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Jan 19 00:08:34 openqa systemd[1]: Failed to start Rotate log files.
Jan 19 00:08:34 openqa systemd[1]: logrotate.service: Unit entered failed state.
Jan 19 00:08:34 openqa systemd[1]: logrotate.service: Failed with result 'exit-code'.
- Copied to action #62309: logrotate fails on QA-Power8-4-kvm (and powerqaworker-qam-1) with "error: destination /var/log/messages-20200118.xz already exists, skipping rotation" added
- Subject changed from osd logrotate fails on "error opening /var/log/salt/master: Permission denied" to osd logrotate fails sporadically on "error opening /var/log/salt/master: Permission denied", only at 00:00, i.e. midnight every day.
- Due date set to 2020-03-13
- Status changed from New to Feedback
- Assignee set to okurz
Received (again?) an alert about this: http://mailman.suse.de/mailman/private/osd-admins/2020-February/000855.html
logrotate fails on permissions. /etc/logrotate.d/salt states for /var/log/salt/master
: su salt salt
but the file is root salt
on osd. Crosschecked on o3, there it's salt salt
hence no problem there. In a clean container environment the start of "salt-master" also creates a file with salt root
so I assume that root salt
on osd might just be a problem due to migrating from a very old version of OS. I will correct the permissions manually and monitor:
chown salt /var/log/salt/master
If this still fails then we could simply ignore the exit status of logrotate, e.g. with a systemd service override and prepend the command in ExecStart with "-" to ignore exit code.
Potentially helpful bugs for this issue: https://bugzilla.suse.com/show_bug.cgi?id=1030009 and https://bugzilla.suse.com/show_bug.cgi?id=1071322
The owner changed back to "root". Not sure who or what did that. There are now alerts for the same problem also happening on openqaworker8. Why that file exists on openqaworker8 I don't know. Have deleted it from there and reset the systemd service with systemctl reset-failed
.
On openqa-monitor.qa this seems to be more tricky:
okurz@openqa-monitor:~> sudo journalctl --since=today -u logrotate
-- Logs begin at Sun 2020-03-01 09:35:15 CET, end at Wed 2020-03-04 22:51:09 CET. --
Mar 04 00:00:24 openqa-monitor systemd[1]: Starting Rotate log files...
Mar 04 00:00:24 openqa-monitor logrotate[28470]: [61B blob data]
Mar 04 00:00:24 openqa-monitor logrotate[28470]: error: 'Access denied for user 'root'@'localhost' (using password: NO)'
Mar 04 00:00:24 openqa-monitor logrotate[28470]: /etc/logrotate.d/mariadb failed, probably because
Mar 04 00:00:24 openqa-monitor logrotate[28470]: the root acount is protected by password.
Mar 04 00:00:24 openqa-monitor logrotate[28470]: See comments in /etc/logrotate.d/mariadb on how to fix this
Mar 04 00:00:24 openqa-monitor logrotate[28470]: error: error running non-shared postrotate script for /var/log/mysql/mysqld.log of '/var/log/mysql/*.log '
Mar 04 00:00:33 openqa-monitor systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
And yes, /etc/logrotate.d/mariadb has some more infos. But I don't know why this seems to happen now.
- Due date deleted (
2020-03-13)
- Status changed from Feedback to Workable
- Assignee deleted (
okurz)
so simply changing permissions did not help. Don't know right now what I can do, leaving for others :)
- Status changed from Workable to Feedback
- Assignee set to okurz
- Due date set to 2020-04-14
- Status changed from Feedback to Resolved
It seems we have not seen this problem lately.
- Related to action #93195: [Alerting] Failed systemd services alert (except openqa.suse.de) on 2021-05-28, logrotate.service on openqaworker-arm-1 added
Also available in: Atom
PDF