Project

General

Profile

Actions

action #109971

closed

Service `systemd-journal-flush` timed out on OSD size:M

Added by okurz about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2022-04-13
Due date:
% Done:

0%

Estimated time:

Description

Observation

martchus@openqa:~> sudo systemctl status systemd-journal-flush
systemd-journal-flush.service - Flush Journal to Persistent Storage
     Loaded: loaded (/usr/lib/systemd/system/systemd-journal-flush.service; static)
     Active: failed (Result: timeout) since Wed 2022-04-13 10:13:33 CEST; 1h 2min ago
       Docs: man:systemd-journald.service(8)
             man:journald.conf(5)
    Process: 24761 ExecStart=/usr/bin/journalctl --flush (code=killed, signal=TERM)
   Main PID: 24761 (code=killed, signal=TERM)

Apr 13 10:13:33 openqa systemd[1]: systemd-journal-flush.service: start operation timed out. Terminating.
Apr 13 10:13:33 openqa systemd[1]: systemd-journal-flush.service: Main process exited, code=killed, status=15/TERM
Apr 13 10:13:33 openqa systemd[1]: systemd-journal-flush.service: Failed with result 'timeout'.
Apr 13 10:13:33 openqa systemd[1]: Failed to start Flush Journal to Persistent Storage.
Notice: journal has been rotated since unit was started, output may be incomplete
Apr 13 10:12:51 openqa systemd[1]: Reloading.
Apr 13 10:14:11 openqa openqa[26889]: [debug] Found associated job 8537016
Apr 13 10:14:11 openqa openqa[24206]: [debug] [pid:24206] Unrecognized referer 'openqa.suse.de'
Apr 13 10:12:51 openqa systemd[1]: /usr/lib/systemd/system/plymouth-start.service:15: Unit configured to use KillMode=none. Support for KillMode=none is deprecated and will be eventually removed in future SLE versions. Please see SUSE TID https://www.suse.com/support/kb/doc/?id=000020394 for more details.
Apr 13 10:14:11 openqa openqa[26930]: [debug] Found associated job 8536325
Apr 13 10:14:11 openqa openqa[5364]: [debug] [pid:5364] creating b5b/e61/60b0e14fdda30c0b82b11d190d.png
Apr 13 10:14:11 openqa openqa[5364]: [debug] [pid:5364] creating b5b/e61/60b0e14fdda30c0b82b11d190d.png
Apr 13 10:14:11 openqa openqa[5364]: [debug] [pid:5364] creating 192/859/b528ab962e838ba92cee863c7c.png
Apr 13 10:14:11 openqa openqa[5364]: [debug] [pid:5364] creating 5e7/669/ae192e40aa19d3221f314519cc.png
Apr 13 10:14:11 openqa openqa[5364]: [debug] [pid:5364] creating 799/a7d/94b57cb7da1485753f37a95286.png
Apr 13 10:12:51 openqa systemd[1]: /usr/lib/systemd/system/rebootmgr.service:14: Unit configured to use KillMode=none. Support for KillMode=none is deprecated and will be eventually removed in future SLE versions. Please see SUSE TID https://www.suse.com/support/kb/doc/?id=000020394 for more details.
Apr 13 10:14:11 openqa openqa[26926]: [debug] [pid:26926] Unrecognized referer 'openqa.suse.de'
Apr 13 10:14:11 openqa openqa[21908]: [debug] Found associated job 8536327
Apr 13 10:12:51 openqa systemd[1]: /usr/lib/systemd/system/rpc-statd.service:14: PIDFile= references a path below legacy directory /var/run/, updating /var/run/rpc.statd.pid → /run/rpc.statd.pid; please update the unit file accordingly.
Apr 13 10:14:11 openqa openqa[26910]: [debug] Found associated job 8536335
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] Unrecognized referer 'openqa.suse.de'
Apr 13 10:13:01 openqa systemd[1]: Started Session 31419 of user geekotest.
Apr 13 10:14:11 openqa openqa[26911]: [debug] Found associated job 8536343
Apr 13 10:14:11 openqa openqa[24888]: [debug] Found associated job 8536335
Apr 13 10:14:11 openqa openqa[24888]: [debug] [pid:24888] Unrecognized referer 'openqa.suse.de'
Apr 13 10:13:12 openqa systemd[1]: session-31419.scope: Succeeded.
Apr 13 10:14:11 openqa openqa[26930]: [debug] Found associated job 8537016
Apr 13 10:14:11 openqa openqa[17923]: [debug] [pid:17923] Unrecognized referer 'openqa.suse.de'
Apr 13 10:13:33 openqa systemd[1]: systemd-journal-flush.service: start operation timed out. Terminating.
Apr 13 10:14:11 openqa openqa[26884]: [debug] Found associated job 8537158
Apr 13 10:14:11 openqa openqa[23135]: [debug] Found associated job 8539359
Apr 13 10:13:33 openqa systemd[1]: systemd-journal-flush.service: Main process exited, code=killed, status=15/TERM
Apr 13 10:14:11 openqa openqa[26920]: [debug] [pid:26920] Unrecognized referer 'openqa.suse.de'
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating ec4/e32/d22f509b78f39560571a66bd6e.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating c1b/5a2/90219d0018a674b05862ba1e08.png
Apr 13 10:13:33 openqa systemd[1]: systemd-journal-flush.service: Failed with result 'timeout'.
Apr 13 10:14:11 openqa openqa[26879]: [debug] Found associated job 8536325
Apr 13 10:14:11 openqa openqa[14315]: [debug] [pid:14315] Unrecognized referer 'openqa.suse.de'
Apr 13 10:13:33 openqa systemd[1]: Failed to start Flush Journal to Persistent Storage.
Apr 13 10:14:11 openqa openqa[26904]: [debug] Found associated job 8536335
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 9d0/c9c/c6abb593eb16da3230e719bd45.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 388/9b0/3d6972423c3e71bd81308e1947.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 1d1/b16/e079c4a2af63a518e7577263ff.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 1d0/345/14dc55af334c6c53a757fb59dc.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 3cd/906/845cbc5e9e2e81c321971b242f.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating f89/3a6/07895e156c474c53964c5a554d.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 011/125/6cb1b58a762c072ea0832c3ed7.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 907/19e/02c170e51694b1035569396399.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 1e2/9a8/34d90d3f9cc2f7538aafa62dc2.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating fb1/dce/c4c2f21f246b4c7f984061fe58.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating f4d/5ef/b289d75cc5f9c4a1888ac8eb19.png
Apr 13 10:14:11 openqa openqa[1707]: [debug] [pid:1707] creating 8f3/f12/209a9bcd24a5f76b080a604d55.png
Apr 13 10:14:01 openqa systemd[1]: Started Session 31420 of user geekotest.

(Noticed via alert for failed systemd unit.)

So far I've restarted the unit and it works. Not sure why it was running into a timeout. If it was just a one-time problem we can maybe just close the ticket again. For now I've been creating it to have a record.

So far I've restarted the unit and it works. Not sure why it was running into a timeout. If it was just a one-time problem we can maybe just close the ticket again. For now I've been creating it to have a record.

Suggestions

  • There's no user impact
  • This only occurred once so far
  • We don't know what caused the timeout
  • Do a brief research on what this error means
Actions #2

Updated by okurz about 2 years ago

https://bugzilla.redhat.com/show_bug.cgi?id=1649857 mentions that there can be a problem with the boot option "rd.debug" which here does not apply for us.

sudo systemctl show systemd-journal-flush.service | grep ^Timeout
shows TimeoutStartUSec=1min 30s, so default. Maybe OSD being a VM which is running a heavy load and lots of logs needs longer sometimes.

I added a custom timeout configuration with

sudo mkdir -p /etc/systemd/system/systemd-journal-flush.service.d/ && echo -e "[Service]\nTimeoutStartSec=300" | sudo tee /etc/systemd/system/systemd-journal-flush.service.d/startup-timeout.conf

Created
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/669
to make that persistent.

Actions #3

Updated by okurz about 2 years ago

  • Status changed from Workable to Feedback

merged

Actions #4

Updated by okurz about 2 years ago

  • Status changed from Feedback to Resolved

deployed and active. No further problems observed and none expected for foreseeable future anyway.

Actions

Also available in: Atom PDF