action #106759
Updated by livdywan over 2 years ago
## Observation # /var/log/openqa [2022-02-12T04:12:47.195902Z] [error] Worker 22011 has no heartbeat (400 seconds), restarting [2022-02-12T04:12:56.228360Z] [error] Worker 28596 has no heartbeat (400 seconds), restarting ## Acceptance criteria **AC1**: The cause of the heartbeat Heartbeat message is known not observed ## Suggestions - ~~Add the message to the blocklist~~ - ~~Look at Mojolicious APIs related to preforking of Mojo workers~~ (problem is a blocked worker process, Mojolicious API can't help with that) - ~~Extend the configured timeout from 400s~~ The timeout's already pretty high - ~~Confirm where the errors are logged, and add context~~ (Mojolicious logs the error, and there is no context information to add, different process) - ~~500 error in access_log from live_view_handler~~ (unrelated, time doesn't match and the live view handler doesn't use preforking) `[15/Feb/2022:07:16:02 +0000] "GET /liveviewhandler/tests/2189494/developer/ws-proxy HTTP/1.1" 500 ` ## Additional info from `/usr/lib/perl5/vendor_perl/5.34.0/Mojolicious/Guides/FAQ.pod`: ``` =head2 What does "Worker 31842 has no heartbeat (50 seconds), restarting" mean? As long as they are accepting new connections, worker processes of all built-in pre-forking web servers send heartbeat messages to the manager process at regular intervals, to signal that they are still responsive. A blocking operation such as an infinite loop in your application can prevent this, and will force the affected worker to be restarted after a timeout. This timeout defaults to C<50> seconds and can be extended with the attribute L<Mojo::Server::Prefork/"heartbeat_timeout"> if your application requires it. ``` ``` lib/Mojo/Server/Prefork.pm 10 # No heartbeat (graceful stop) 9 $log->error("Worker $pid has no heartbeat ($ht seconds), restarting") and $w->{graceful} = $time 8 if !$w->{graceful} && ($w->{time} + $interval + $ht <= $time); 7 6 # Graceful stop with timeout 5 my $graceful = $w->{graceful} ||= $self->{graceful} ? $time : undef; 4 $log->info("Stopping worker $pid gracefully ($gt seconds)") and (kill 'QUIT', $pid or $self->_stopped($pid)) 3 if $graceful && !$w->{quit}++; ```