action #138536
Updated by livdywan about 1 year ago
## Observation OpenQA logreport says ``` [2023-10-25T12:14:44.745934Z] [error] Worker 17519 has no heartbeat (900 seconds), restarting (see FAQ for more) [2023-10-25T12:15:26.832072Z] [error] Worker 19825 has no heartbeat (900 seconds), restarting (see FAQ for more) ``` From /var/log/openqa ``` # grep 17519 /var/log/openqa [2023-10-25T02:16:31.175195Z] [debug] [pid:2595] Updating seen of worker 1060 from worker_status (free) [2023-10-25T10:19:00.517519Z] [debug] [pid:31333] Sending AMQP event: opensuse.openqa.job.done [2023-10-25T11:59:29.405375Z] [info] Worker 17519 started [2023-10-25T12:14:44.745934Z] [error] Worker 17519 has no heartbeat (900 seconds), restarting (see FAQ for more) [2023-10-25T12:14:44.746126Z] [info] Stopping worker 17519 gracefully (800 seconds) [2023-10-25T12:16:27.274372Z] [info] Worker 17519 stopped # grep 19825 /var/log/openqa [2023-10-25T09:04:16.519825Z] [debug] [a7YZwJtcRdbL] looking for "autoinst-log.txt" in [ [2023-10-25T11:15:18.028018Z] [info] Worker 19825 started [2023-10-25T11:17:54.719018Z] [debug] [pid:19825] _carry_over_candidate(3674609): _failure_reason=firefox_audio:softfailed [2023-10-25T11:17:54.750455Z] [debug] [pid:19825] Sending AMQP event: opensuse.openqa.job.done [2023-10-25T11:17:54.750867Z] [debug] [pid:19825] AMQP URL: amqps://openqa:b45z45bz645tzrhwer@rabbit.opensuse.org:5671/?exchange=pubsub [2023-10-25T11:17:54.801831Z] [debug] [pid:19825] opensuse.openqa.job.done published [2023-10-25T11:21:33.090407Z] [debug] [pid:19825] Sending AMQP event: opensuse.openqa.comment.create [2023-10-25T11:21:33.090755Z] [debug] [pid:19825] AMQP URL: amqps://openqa:b45z45bz645tzrhwer@rabbit.opensuse.org:5671/?exchange=pubsub [2023-10-25T11:21:33.127004Z] [debug] [pid:19825] opensuse.openqa.comment.create published pid => 19825, pid => 19825, pid => 19825, [2023-10-25T12:00:06.480220Z] [debug] [pid:19825] _carry_over_candidate(3674853): _failure_reason=GOOD [2023-10-25T12:00:06.631183Z] [debug] [pid:19825] Sending AMQP event: opensuse.openqa.job.done [2023-10-25T12:00:06.631658Z] [debug] [pid:19825] AMQP URL: amqps://openqa:b45z45bz645tzrhwer@rabbit.opensuse.org:5671/?exchange=pubsub [2023-10-25T12:00:07.055590Z] [debug] [pid:19825] opensuse.openqa.job.done published [2023-10-25T12:15:26.832072Z] [error] Worker 19825 has no heartbeat (900 seconds), restarting (see FAQ for more) [2023-10-25T12:15:26.832215Z] [info] Stopping worker 19825 gracefully (800 seconds) [2023-10-25T12:21:03.316812Z] [info] Worker 19825 stopped [2023-10-25T12:41:34.619825Z] [debug] [pid:3076] _carry_over_candidate(3674823): _failure_reason=GOOD ``` ## Acceptance criteria * **AC1:** It is understood why we saw the no heartbeat message here ## Suggestions * https://docs.mojolicious.org/Mojolicious/Guides/FAQ#What-does-Worker-31842-has-no-heartbeat-50-seconds-restarting-mean * Maybe also related to #138535 ## Out of scope * The _carry_over_candidate messages are unrelated - grep just matches the timestamp here