Project

General

Profile

Actions

action #55658

closed

[osd] All jobs at midnight were incomplete and restarted due to cron-based automatic apache restart at 0:00

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2019-08-16
Due date:
2019-08-31
% Done:

0%

Estimated time:

Description

Observation

A lot of jobs on openqa.suse.de were incompleted at 00:00 between 2019-08-16 and 2019-08-17, e.g. https://openqa.suse.de/tests/3259817 . All of them were restarted, from all different kind of workers. Something restarted?


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #54137: Upgrade osd to a supported Leap version (from 42.3)Resolvedokurz2019-07-11

Actions
Actions #1

Updated by okurz over 4 years ago

  • Subject changed from All jobs at midnight were incomplete and restarted to [osd] All jobs at midnight were incomplete and restarted

ah, yes, it's the cron job to restart the apache webserver, from system journal:

Aug 17 00:00:00 openqa systemd[1]: Reloading The Apache Webserver.
Aug 17 00:00:00 openqa systemd[1]: Reloaded The Apache Webserver.

from /etc/cron.d/restart_apache . Do we still need that? Because that file has no comment, is not from salt and no explanation I guess the answer is no.

Actions #2

Updated by okurz over 4 years ago

  • Subject changed from [osd] All jobs at midnight were incomplete and restarted to [osd] All jobs at midnight were incomplete and restarted due to cron-based automatic apache restart at 0:00
  • Due date set to 2019-08-31
  • Status changed from New to Feedback
  • Assignee set to okurz

I disabled the cron job and will monitor if we still need this

Actions #3

Updated by coolo over 4 years ago

How did you disable it?

The reason the cron exists is because deleted apache log files kept being around in the file system and / is rather small. It's one of those band aids that stay as the work around doesn't hurt that much. But we should move to 15.1 - o3 doesn't show this bug.

Actions #4

Updated by okurz over 4 years ago

  • Related to action #54137: Upgrade osd to a supported Leap version (from 42.3) added
Actions #5

Updated by okurz over 4 years ago

coolo wrote:

How did you disable it?

commenting out the line in the cron file:

# cat /etc/cron.d/restart_apache
#SHELL=/bin/bash
# okurz: 2019-08-17: disabled as per
# https://progress.opensuse.org/issues/55658
#0 0 * * * root systemctl restart apache2.service 

The reason the cron exists is because deleted apache log files kept being around in the file system and / is rather small.

I see. I felt confident do disable the above as currently I am monitoring OSD anyway at least on a daily base, including the space on / which I also handled just recently in #55463 . It's good we have thruk/nagios/check_mk :)

It's one of those band aids that stay as the work around doesn't hurt that much. But we should move to 15.1 - o3 doesn't show this bug.

I know. I also wondered why we don't need it on o3. One more reason for doing the upgrade soon. I guess I can pick this up. I also wondered for #54902 already if the situation might change on 15.1.

Actions #6

Updated by coolo over 4 years ago

I'm just asking how you disabled it, because apache was still restarted last night

Actions #7

Updated by okurz over 4 years ago

well, as I can see now I did not actually save the file. So next midnight apache should not restart.

Actions #8

Updated by okurz over 4 years ago

  • Status changed from Feedback to Resolved

we removed the file for good during upgrade of osd to Leap 15.1

Actions

Also available in: Atom PDF