action #55658
closed[osd] All jobs at midnight were incomplete and restarted due to cron-based automatic apache restart at 0:00
0%
Description
Observation¶
A lot of jobs on openqa.suse.de were incompleted at 00:00 between 2019-08-16 and 2019-08-17, e.g. https://openqa.suse.de/tests/3259817 . All of them were restarted, from all different kind of workers. Something restarted?
Updated by okurz over 5 years ago
- Subject changed from All jobs at midnight were incomplete and restarted to [osd] All jobs at midnight were incomplete and restarted
ah, yes, it's the cron job to restart the apache webserver, from system journal:
Aug 17 00:00:00 openqa systemd[1]: Reloading The Apache Webserver.
Aug 17 00:00:00 openqa systemd[1]: Reloaded The Apache Webserver.
from /etc/cron.d/restart_apache . Do we still need that? Because that file has no comment, is not from salt and no explanation I guess the answer is no.
Updated by okurz over 5 years ago
- Subject changed from [osd] All jobs at midnight were incomplete and restarted to [osd] All jobs at midnight were incomplete and restarted due to cron-based automatic apache restart at 0:00
- Due date set to 2019-08-31
- Status changed from New to Feedback
- Assignee set to okurz
I disabled the cron job and will monitor if we still need this
Updated by coolo over 5 years ago
How did you disable it?
The reason the cron exists is because deleted apache log files kept being around in the file system and / is rather small. It's one of those band aids that stay as the work around doesn't hurt that much. But we should move to 15.1 - o3 doesn't show this bug.
Updated by okurz over 5 years ago
- Related to action #54137: Upgrade osd to a supported Leap version (from 42.3) added
Updated by okurz over 5 years ago
coolo wrote:
How did you disable it?
commenting out the line in the cron file:
# cat /etc/cron.d/restart_apache
#SHELL=/bin/bash
# okurz: 2019-08-17: disabled as per
# https://progress.opensuse.org/issues/55658
#0 0 * * * root systemctl restart apache2.service
The reason the cron exists is because deleted apache log files kept being around in the file system and / is rather small.
I see. I felt confident do disable the above as currently I am monitoring OSD anyway at least on a daily base, including the space on / which I also handled just recently in #55463 . It's good we have thruk/nagios/check_mk :)
It's one of those band aids that stay as the work around doesn't hurt that much. But we should move to 15.1 - o3 doesn't show this bug.
I know. I also wondered why we don't need it on o3. One more reason for doing the upgrade soon. I guess I can pick this up. I also wondered for #54902 already if the situation might change on 15.1.
Updated by coolo over 5 years ago
I'm just asking how you disabled it, because apache was still restarted last night
Updated by okurz over 5 years ago
well, as I can see now I did not actually save the file. So next midnight apache should not restart.
Updated by okurz over 5 years ago
- Status changed from Feedback to Resolved
we removed the file for good during upgrade of osd to Leap 15.1