action #124412
Updated by mkittler almost 2 years ago
## Observation See https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=1676150994873&to=1676257956411 for the time frame. (It also shows some mount units but they were only failing very shortly.) This is the log of logratate-openqa on OSD: ``` Feb 13 00:00:03 openqa logrotate[30440]: Now: 2023-02-13 00:00 Feb 13 00:00:03 openqa logrotate[30440]: Last rotated at 2023-02-12 05:00 Feb 13 00:00:03 openqa logrotate[30440]: log does not need rotating (log size is below the 'size' threshold) Feb 13 00:00:03 openqa systemd[1]: logrotate-openqa.service: Deactivated successfully. Feb 13 00:00:03 openqa systemd[1]: Finished Rotate openQA log files. Feb 13 01:00:00 openqa systemd[1]: Starting Rotate openQA log files... Feb 13 01:00:01 openqa logrotate[11051]: reading config file /etc/logrotate.d/openqa Feb 13 01:00:01 openqa logrotate[11051]: warning: 'size' overrides previously specified 'hourly' Feb 13 01:00:01 openqa logrotate[11051]: compress_prog is now /usr/bin/xz Feb 13 01:00:01 openqa logrotate[11051]: compress_ext was changed to .xz Feb 13 01:00:01 openqa logrotate[11051]: uncompress_prog is now /usr/bin/xzdec Feb 13 01:00:01 openqa logrotate[11051]: warning: 'size' overrides previously specified 'hourly' Feb 13 01:00:01 openqa logrotate[11051]: compress_prog is now /usr/bin/xz Feb 13 01:00:01 openqa logrotate[11051]: compress_ext was changed to .xz Feb 13 01:00:01 openqa logrotate[11051]: uncompress_prog is now /usr/bin/xzdec Feb 13 01:00:01 openqa logrotate[11051]: reading config file /etc/logrotate.d/openqa-apache Feb 13 01:00:01 openqa logrotate[11051]: warning: 'size' overrides previously specified 'hourly' Feb 13 01:00:01 openqa logrotate[11051]: compress_prog is now /usr/bin/xz Feb 13 01:00:01 openqa logrotate[11051]: compress_ext was changed to .xz Feb 13 01:00:01 openqa logrotate[11051]: uncompress_prog is now /usr/bin/xzdec Feb 13 01:00:01 openqa logrotate[11051]: error: state file /var/lib/misc/logrotate.status is already locked Feb 13 01:00:01 openqa logrotate[11051]: logrotate does not support parallel execution on the same set of logfiles. Feb 13 01:00:01 openqa systemd[1]: logrotate-openqa.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED Feb 13 01:00:01 openqa systemd[1]: logrotate-openqa.service: Failed with result 'exit-code'. Feb 13 01:00:01 openqa systemd[1]: Failed to start Rotate openQA log files. Feb 13 02:00:01 openqa systemd[1]: Starting Rotate openQA log files... Feb 13 02:00:01 openqa logrotate[12997]: reading config file /etc/logrotate.d/openqa ``` The log for logrotate on the Pi is unfortunately empty. So there's likely not much we can do about the Pi at this point. I haven't paused the alerts. Let's see whether this is happening again. ## Acceptance criteria * **AC1**: The pi worker no longer triggers the failed systemd services alert * **AC2**: OSD no longer triggers the failed systemd services alert ~~Note Note that the issue on OSD was a one-time issue and is at this point no concern anymore.~~ It happened again on OSD as well: ``` Feb 26 00:00:03 openqa systemd[1]: Finished Rotate openQA log files. Feb 26 01:00:00 openqa systemd[1]: Starting Rotate openQA log files... Feb 26 01:00:00 openqa logrotate[16932]: reading config file /etc/logrotate.d/openqa Feb 26 01:00:00 openqa logrotate[16932]: warning: 'size' overrides previously specified 'hourly' Feb 26 01:00:00 openqa logrotate[16932]: compress_prog is now /usr/bin/xz Feb 26 01:00:00 openqa logrotate[16932]: compress_ext was changed to .xz Feb 26 01:00:00 openqa logrotate[16932]: uncompress_prog is now /usr/bin/xzdec Feb 26 01:00:00 openqa logrotate[16932]: warning: 'size' overrides previously specified 'hourly' Feb 26 01:00:00 openqa logrotate[16932]: compress_prog is now /usr/bin/xz Feb 26 01:00:00 openqa logrotate[16932]: compress_ext was changed to .xz Feb 26 01:00:00 openqa logrotate[16932]: uncompress_prog is now /usr/bin/xzdec Feb 26 01:00:00 openqa logrotate[16932]: reading config file /etc/logrotate.d/openqa-apache Feb 26 01:00:00 openqa logrotate[16932]: warning: 'size' overrides previously specified 'hourly' Feb 26 01:00:00 openqa logrotate[16932]: compress_prog is now /usr/bin/xz Feb 26 01:00:00 openqa logrotate[16932]: compress_ext was changed to .xz Feb 26 01:00:00 openqa logrotate[16932]: uncompress_prog is now /usr/bin/xzdec Feb 26 01:00:00 openqa logrotate[16932]: error: state file /var/lib/misc/logrotate.status is already locked Feb 26 01:00:00 openqa logrotate[16932]: logrotate does not support parallel execution on the same set of logfiles. Feb 26 01:00:00 openqa systemd[1]: logrotate-openqa.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED Feb 26 01:00:00 openqa systemd[1]: logrotate-openqa.service: Failed with result 'exit-code'. Feb 26 01:00:00 openqa systemd[1]: Failed to start Rotate openQA log files. Feb 26 02:00:00 openqa systemd[1]: Starting Rotate openQA log files... ``` anymore. ## Suggestions * Look into log files on osd for details (as above) * Check what happens if logrotate is called manually. It's rather safe to call logrotate again but might trigger the above * Likely an unclean shutdown caused this. Try to trigger manually. * Ask @dheidler to fix the Pi-worker