Project

General

Profile

action #106759

Updated by livdywan over 2 years ago

## Observation 
     # /var/log/openqa 
     [2022-02-12T04:12:47.195902Z] [error] Worker 22011 has no heartbeat (400 seconds), restarting 
     [2022-02-12T04:12:56.228360Z] [error] Worker 28596 has no heartbeat (400 seconds), restarting 

 ## Acceptance criteria 

 **AC1**: The cause of the heartbeat Heartbeat message is known not observed 

 ## Suggestions 
 - ~~Add the message to the blocklist~~ 
 - ~~Look at Mojolicious APIs related to preforking of Mojo workers~~ (problem is a blocked worker process, Mojolicious API can't help with that) 
 - ~~Extend the configured timeout from 400s~~ The timeout's already pretty high 
 - ~~Confirm where the errors are logged, and add context~~ (Mojolicious logs the error, and there is no context information to add, different process) 
 - ~~500 error in access_log from live_view_handler~~ (unrelated, time doesn't match and the live view handler doesn't use preforking) 
   `[15/Feb/2022:07:16:02 +0000] "GET /liveviewhandler/tests/2189494/developer/ws-proxy HTTP/1.1" 500 ` 

 ## Additional info 
 from `/usr/lib/perl5/vendor_perl/5.34.0/Mojolicious/Guides/FAQ.pod`: 

 ``` 
 =head2 What does "Worker 31842 has no heartbeat (50 seconds), restarting" mean? 

 As long as they are accepting new connections, worker processes of all built-in pre-forking web servers send heartbeat 
 messages to the manager process at regular intervals, to signal that they are still responsive. A blocking operation 
 such as an infinite loop in your application can prevent this, and will force the affected worker to be restarted after 
 a timeout. This timeout defaults to C<50> seconds and can be extended with the attribute 
 L<Mojo::Server::Prefork/"heartbeat_timeout"> if your application requires it. 
 ``` 

 ``` 
 lib/Mojo/Server/Prefork.pm 
  10       # No heartbeat (graceful stop)                                               
   9       $log->error("Worker $pid has no heartbeat ($ht seconds), restarting") and $w->{graceful} = $time 
   8         if !$w->{graceful} && ($w->{time} + $interval + $ht <= $time);             
   7                                                                                  
   6       # Graceful stop with timeout                                                 
   5       my $graceful = $w->{graceful} ||= $self->{graceful} ? $time : undef;         
   4       $log->info("Stopping worker $pid gracefully ($gt seconds)") and (kill 'QUIT', $pid or $self->_stopped($pid)) 
   3         if $graceful && !$w->{quit}++;                                             
 ```

Back