https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842022-01-27T13:25:02ZopenSUSE Project Management ToolopenQA Infrastructure - action #105621: [Alerting] Failed systemd services alerthttps://progress.opensuse.org/issues/105621?journal_id=4847882022-01-27T13:25:02Zokurzokurz@suse.com
<ul><li><strong>Copied from</strong> <i><a class="issue tracker-4 status-6 priority-5 priority-high3 closed" href="/issues/105618">action #105618</a>: [Alerting] CPU Load alert size:S</i> added</li></ul> openQA Infrastructure - action #105621: [Alerting] Failed systemd services alerthttps://progress.opensuse.org/issues/105621?journal_id=4856942022-01-31T14:35:47Zmkittlermarius.kittler@suse.com
<ul><li><strong>Assignee</strong> set to <i>mkittler</i></li></ul> openQA Infrastructure - action #105621: [Alerting] Failed systemd services alerthttps://progress.opensuse.org/issues/105621?journal_id=4857032022-01-31T14:54:37Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Feedback</i></li></ul><p>The cache service logs don't show what the problem was, just that the main process exited:</p>
<pre><code>Jan 27 05:17:23 openqaworker-arm-1 openqa-worker-cacheservice-minion[25950]: [25950] [i] [#50] Downloading "SLE-15-SP3-Full-aarch64-GM-Media1.iso" from "http://openqa.suse.de/tests/8039274/asset/iso/SLE-15-SP3-Full-aarch64-GM-Media1.is>
Jan 27 05:17:29 openqaworker-arm-1 openqa-worker-cacheservice-minion[26636]: [26636] [i] [#51] Downloading: "SLES-15-SP3-aarch64-mru-install-minimal-with-addons-Build20220127-1-Server-DVD-Updates-aarch64-virtio-uefi-vars.qcow2"
Jan 27 05:17:29 openqaworker-arm-1 openqa-worker-cacheservice-minion[26636]: [26636] [i] [#51] Cache size of "/var/lib/openqa/cache" is 49 GiB, with limit 50 GiB
Jan 27 05:17:29 openqaworker-arm-1 openqa-worker-cacheservice-minion[26636]: [26636] [i] [#51] Downloading "SLES-15-SP3-aarch64-mru-install-minimal-with-addons-Build20220127-1-Server-DVD-Updates-aarch64-virtio-uefi-vars.qcow2" from "ht>
Jan 27 05:17:39 openqaworker-arm-1 openqa-worker-cacheservice-minion[26677]: [26677] [i] [#52] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Jan 27 05:17:39 openqaworker-arm-1 openqa-worker-cacheservice-minion[26677]: [26677] [i] [#52] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
Jan 27 05:22:55 openqaworker-arm-1 systemd[1]: Stopping OpenQA Worker Cache Service Minion...
Jan 27 05:22:59 openqaworker-arm-1 openqa-worker-cacheservice-minion[2378]: [2378] [i] Worker 2378 stopped
Jan 27 05:23:00 openqaworker-arm-1 systemd[1]: openqa-worker-cacheservice-minion.service: Main process exited, code=exited, status=192/n/a
Jan 27 05:23:00 openqaworker-arm-1 systemd[1]: openqa-worker-cacheservice-minion.service: Failed with result 'exit-code'.
Jan 27 05:23:00 openqaworker-arm-1 systemd[1]: Stopped OpenQA Worker Cache Service Minion.
Jan 27 05:23:00 openqaworker-arm-1 systemd[1]: Started OpenQA Worker Cache Service Minion.
n 27 05:29:45 openqaworker-arm-1 openqa-worker-cacheservice-minion[28656]: [28656] [i] Cache size of "/var/lib/openqa/cache" is 49 GiB, with limit 50 GiB
Jan 27 05:29:45 openqaworker-arm-1 openqa-worker-cacheservice-minion[28656]: [28656] [i] Resetting all leftover locks after restart
Jan 27 05:29:45 openqaworker-arm-1 openqa-worker-cacheservice-minion[28656]: [28656] [i] Worker 28656 started
Jan 27 05:29:45 openqaworker-arm-1 openqa-worker-cacheservice-minion[37710]: [37710] [i] [#53] Downloading: "SLES-15-SP3-aarch64-mru-install-minimal-with-addons-Build20220127-1-Server-DVD-Updates-aarch64-virtio.qcow2"
Jan 27 05:29:50 openqaworker-arm-1 openqa-worker-cacheservice-minion[37749]: [37749] [i] [#54] Downloading: "SLE-15-SP3-Full-aarch64-GM-Media1.iso"
Jan 27 05:30:01 openqaworker-arm-1 openqa-worker-cacheservice-minion[37804]: [37804] [i] [#55] Downloading: "SLES-15-SP3-aarch64-mru-install-minimal-with-addons-Build20220127-1-Server-DVD-Updates-aarch64-virtio-uefi-vars.qcow2"
Jan 27 05:30:11 openqaworker-arm-1 openqa-worker-cacheservice-minion[37894]: [37894] [i] [#56] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Jan 27 05:30:11 openqaworker-arm-1 openqa-worker-cacheservice-minion[37894]: [37894] [i] [#56] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
Jan 27 05:34:23 openqaworker-arm-1 openqa-worker-cacheservice-minion[44451]: [44451] [i] [#57] Downloading: "SLES-12-SP5-aarch64-mru-install-minimal-with-addons-Build20220127-1-Server-DVD-Updates-aarch64-virtio.qcow2"
Jan 27 05:34:43 openqaworker-arm-1 openqa-worker-cacheservice-minion[44451]: [44451] [i] [#57] Cache size of "/var/lib/openqa/cache" is 49 GiB, with limit 50 GiB
Jan 27 05:34:43 openqaworker-arm-1 openqa-worker-cacheservice-minion[44451]: [44451] [i] [#57] Downloading "SLES-12-SP5-aarch64-mru-install-minimal-with-addons-Build20220127-1-Server-DVD-Updates-aarch64-virtio.qcow2" from "http://openq>
Jan 27 05:34:48 openqaworker-arm-1 openqa-worker-cacheservice-minion[44538]: [44538] [i] [#58] Downloading: "SLE-12-SP5-Server-DVD-aarch64-GM-DVD1.iso"
Jan 27 05:35:45 openqaworker-arm-1 openqa-worker-cacheservice-minion[44538]: [44538] [i] [#58] Cache size of "/var/lib/openqa/cache" is 47 GiB, with limit 50 GiB
Jan 27 05:35:45 openqaworker-arm-1 openqa-worker-cacheservice-minion[44538]: [44538] [i] [#58] Downloading "SLE-12-SP5-Server-DVD-aarch64-GM-DVD1.iso" from "http://openqa.suse.de/tests/8039492/asset/iso/SLE-12-SP5-Server-DVD-aarch64-GM>
Jan 27 05:35:49 openqaworker-arm-1 openqa-worker-cacheservice-minion[44821]: [44821] [i] [#59] Downloading: "SLES-12-SP5-aarch64-mru-install-minimal-with-addons-Build20220127-1-Server-DVD-Updates-aarch64-virtio-uefi-vars.qcow2"
Jan 27 05:35:50 openqaworker-arm-1 openqa-worker-cacheservice-minion[44821]: [44821] [i] [#59] Cache size of "/var/lib/openqa/cache" is 46 GiB, with limit 50 GiB
Jan 27 05:35:50 openqaworker-arm-1 openqa-worker-cacheservice-minion[44821]: [44821] [i] [#59] Downloading "SLES-12-SP5-aarch64-mru-install-minimal-with-addons-Build20220127-1-Server-DVD-Updates-aarch64-virtio-uefi-vars.qcow2" from "ht>
Jan 27 05:35:55 openqaworker-arm-1 openqa-worker-cacheservice-minion[44832]: [44832] [i] [#60] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Jan 27 05:35:55 openqaworker-arm-1 openqa-worker-cacheservice-minion[44832]: [44832] [i] [#60] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
Jan 27 05:47:56 openqaworker-arm-1 openqa-worker-cacheservice-minion[48557]: [48557] [i] [#61] Downloading: "SLE-15-SP4-Online-aarch64-Build88.4-Media1.iso.sha256"
Jan 27 05:47:57 openqaworker-arm-1 openqa-worker-cacheservice-minion[48557]: [48557] [i] [#61] Cache size of "/var/lib/openqa/cache" is 46 GiB, with limit 50 GiB
Jan 27 05:47:57 openqaworker-arm-1 openqa-worker-cacheservice-minion[48557]: [48557] [i] [#61] Downloading "SLE-15-SP4-Online-aarch64-Build88.4-Media1.iso.sha256" from "http://openqa.suse.de/tests/8039541/asset/other/SLE-15-SP4-Online->
Jan 27 05:48:02 openqaworker-arm-1 openqa-worker-cacheservice-minion[48599]: [48599] [i] [#62] Downloading: "SLES-15-SP4-aarch64-Build88.4@aarch64-gnome.qcow2"
Jan 27 05:48:31 openqaworker-arm-1 openqa-worker-cacheservice-minion[48599]: [48599] [i] [#62] Cache size of "/var/lib/openqa/cache" is 46 GiB, with limit 50 GiB
Jan 27 05:48:31 openqaworker-arm-1 openqa-worker-cacheservice-minion[48599]: [48599] [i] [#62] Downloading "SLES-15-SP4-aarch64-Build88.4@aarch64-gnome.qcow2" from "http://openqa.suse.de/tests/8039541/asset/hdd/SLES-15-SP4-aarch64-Buil>
Jan 27 05:48:32 openqaworker-arm-1 openqa-worker-cacheservice-minion[48679]: [48679] [i] [#63] Downloading: "SLE-15-SP4-Online-aarch64-Build88.4-Media1.iso"
Jan 27 05:48:39 openqaworker-arm-1 openqa-worker-cacheservice-minion[48679]: [48679] [i] [#63] Cache size of "/var/lib/openqa/cache" is 48 GiB, with limit 50 GiB
Jan 27 05:48:39 openqaworker-arm-1 openqa-worker-cacheservice-minion[48679]: [48679] [i] [#63] Downloading "SLE-15-SP4-Online-aarch64-Build88.4-Media1.iso" from "http://openqa.suse.de/tests/8039541/asset/iso/SLE-15-SP4-Online-aarch64-B>
Jan 27 05:48:43 openqaworker-arm-1 openqa-worker-cacheservice-minion[48707]: [48707] [i] [#64] Downloading: "SLES-15-SP4-aarch64-Build88.4@aarch64-gnome-uefi-vars.qcow2"
Jan 27 05:48:53 openqaworker-arm-1 openqa-worker-cacheservice-minion[48759]: [48759] [i] [#65] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Jan 27 05:48:53 openqaworker-arm-1 openqa-worker-cacheservice-minion[48759]: [48759] [i] [#65] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
Jan 27 05:52:36 openqaworker-arm-1 openqa-worker-cacheservice-minion[1457]: [1457] [i] [#66] Downloading: "SLE-15-SP4-Online-aarch64-Build43.1-Media1.iso.sha256"
Jan 27 05:52:36 openqaworker-arm-1 openqa-worker-cacheservice-minion[1457]: [1457] [i] [#66] Cache size of "/var/lib/openqa/cache" is 48 GiB, with limit 50 GiB
Jan 27 05:52:36 openqaworker-arm-1 openqa-worker-cacheservice-minion[1457]: [1457] [i] [#66] Downloading "SLE-15-SP4-Online-aarch64-Build43.1-Media1.iso.sha256" from "http://openqa.suse.de/tests/8036768/asset/other/SLE-15-SP4-Online-aa>
Jan 27 05:52:43 openqaworker-arm-1 openqa-worker-cacheservice-minion[1470]: [1470] [i] [#67] Downloading: "SLE-15-SP4-Online-aarch64-Build43.1-Media1.iso.sha256"
Jan 27 05:52:46 openqaworker-arm-1 openqa-worker-cacheservice-minion[1481]: [1481] [i] [#68] Downloading: "sle-15-SP4-aarch64-43.1-textmode@aarch64.qcow2"
Jan 27 05:52:49 openqaworker-arm-1 openqa-worker-cacheservice-minion[1484]: [1484] [i] [#69] Downloading: "sle-15-SP4-aarch64-43.1-textmode@aarch64.qcow2"
-- Reboot --
Jan 27 07:23:36 openqaworker-arm-1 systemd[1]: Started OpenQA Worker Cache Service Minion.
Jan 27 07:23:46 openqaworker-arm-1 openqa-worker-cacheservice-minion[2323]: [2323] [i] Creating cache directory tree for "/var/lib/openqa/cache"
Jan 27 07:23:46 openqaworker-arm-1 systemd[1]: Stopping OpenQA Worker Cache Service Minion...
Jan 27 07:23:46 openqaworker-arm-1 systemd[1]: openqa-worker-cacheservice-minion.service: Succeeded.
Jan 27 07:23:46 openqaworker-arm-1 systemd[1]: Stopped OpenQA Worker Cache Service Minion.
Jan 27 07:23:46 openqaworker-arm-1 systemd[1]: Started OpenQA Worker Cache Service Minion.
Jan 27 07:26:17 openqaworker-arm-1 openqa-worker-cacheservice-minion[2500]: [2500] [i] Cache size of "/var/lib/openqa/cache" is 0 Byte, with limit 50 GiB
Jan 27 07:26:17 openqaworker-arm-1 openqa-worker-cacheservice-minion[2500]: [2500] [i] Resetting all leftover locks after restart
Jan 27 07:26:17 openqaworker-arm-1 openqa-worker-cacheservice-minion[2500]: [2500] [i] Worker 2500 started
</code></pre>
<p>There are no coredumps present. The machine wasn't up that long before it happened and also rebooted shortly afterwards (likely after a crash). So the relevant logs before and after the failure aren't that long. However, I couldn't find any clues in them.</p>
<hr>
<p>The logs for the journal service on the web UI host aren't much more helpful:</p>
<pre><code>martchus@openqa:~> sudo journalctl -fu systemd-journal-flush
-- Logs begin at Sun 2022-01-23 02:46:40 CET. --
Jan 24 12:44:22 openqa systemd[1]: Stopped Flush Journal to Persistent Storage.
Jan 24 12:44:23 openqa systemd[1]: Starting Flush Journal to Persistent Storage...
Jan 24 12:45:05 openqa systemd[1]: Finished Flush Journal to Persistent Storage.
Jan 24 13:08:06 openqa systemd[1]: Stopping Flush Journal to Persistent Storage...
Jan 24 13:08:06 openqa systemd[1]: systemd-journal-flush.service: Succeeded.
Jan 24 13:08:06 openqa systemd[1]: Stopped Flush Journal to Persistent Storage.
Jan 24 13:08:06 openqa systemd[1]: Starting Flush Journal to Persistent Storage...
Jan 24 13:08:07 openqa systemd[1]: Finished Flush Journal to Persistent Storage.
Jan 26 15:19:02 openqa systemd[1]: Starting Flush Journal to Persistent Storage...
Jan 26 15:19:02 openqa systemd[1]: Finished Flush Journal to Persistent Storage.
</code></pre>
<p>The journal on OSD is generally working so whatever the problem was, it hasn't had much impact.</p>
openQA Infrastructure - action #105621: [Alerting] Failed systemd services alerthttps://progress.opensuse.org/issues/105621?journal_id=4859432022-02-01T11:16:37Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>I'm resolving this due to lack of information and the low impact. If it (those are actually two distinct issues) happens more often we can still try to investigate further.</p>
<p>Note that the journal service failure <em>might</em> have something to do with our recent manual tampering of the journal service when reacting to the file system alert.</p>