action #55658
closed
[osd] All jobs at midnight were incomplete and restarted due to cron-based automatic apache restart at 0:00
Added by okurz over 5 years ago.
Updated over 5 years ago.
Description
Observation¶
A lot of jobs on openqa.suse.de were incompleted at 00:00 between 2019-08-16 and 2019-08-17, e.g. https://openqa.suse.de/tests/3259817 . All of them were restarted, from all different kind of workers. Something restarted?
- Subject changed from All jobs at midnight were incomplete and restarted to [osd] All jobs at midnight were incomplete and restarted
ah, yes, it's the cron job to restart the apache webserver, from system journal:
Aug 17 00:00:00 openqa systemd[1]: Reloading The Apache Webserver.
Aug 17 00:00:00 openqa systemd[1]: Reloaded The Apache Webserver.
from /etc/cron.d/restart_apache . Do we still need that? Because that file has no comment, is not from salt and no explanation I guess the answer is no.
- Subject changed from [osd] All jobs at midnight were incomplete and restarted to [osd] All jobs at midnight were incomplete and restarted due to cron-based automatic apache restart at 0:00
- Due date set to 2019-08-31
- Status changed from New to Feedback
- Assignee set to okurz
I disabled the cron job and will monitor if we still need this
How did you disable it?
The reason the cron exists is because deleted apache log files kept being around in the file system and / is rather small. It's one of those band aids that stay as the work around doesn't hurt that much. But we should move to 15.1 - o3 doesn't show this bug.
- Related to action #54137: Upgrade osd to a supported Leap version (from 42.3) added
coolo wrote:
How did you disable it?
commenting out the line in the cron file:
# cat /etc/cron.d/restart_apache
#SHELL=/bin/bash
# okurz: 2019-08-17: disabled as per
# https://progress.opensuse.org/issues/55658
#0 0 * * * root systemctl restart apache2.service
The reason the cron exists is because deleted apache log files kept being around in the file system and / is rather small.
I see. I felt confident do disable the above as currently I am monitoring OSD anyway at least on a daily base, including the space on / which I also handled just recently in #55463 . It's good we have thruk/nagios/check_mk :)
It's one of those band aids that stay as the work around doesn't hurt that much. But we should move to 15.1 - o3 doesn't show this bug.
I know. I also wondered why we don't need it on o3. One more reason for doing the upgrade soon. I guess I can pick this up. I also wondered for #54902 already if the situation might change on 15.1.
I'm just asking how you disabled it, because apache was still restarted last night
well, as I can see now I did not actually save the file. So next midnight apache should not restart.
- Status changed from Feedback to Resolved
we removed the file for good during upgrade of osd to Leap 15.1
Also available in: Atom
PDF