Project

General

Profile

action #121771

Updated by okurz almost 2 years ago

## Observation 

 from openqa_logwarn email 

 ``` 
 [2022-12-09T05:56:20.940749Z] [error] Worker 17014 has no heartbeat (400 seconds), restarting 
 [2022-12-09T05:56:28.358564Z] [error] Worker 20315 has no heartbeat (400 seconds), restarting 
 [2022-12-09T05:57:15.185728Z] [error] Worker 12757 has no heartbeat (400 seconds), restarting 
 [2022-12-09T05:59:37.811478Z] [error] [pid:17014] Failed dispatching message to websocket server over ipc for worker "openqaworker20:": Inactivity timeout at /usr/share/openqa/script/../lib/OpenQA/WebSockets/Client.pm line 40. 
 ``` 

 on 2022-12-19 okurz looked for more recent occurences and found only: 

 ``` 
 ariel:/var/log # xzgrep 'Failed dispatching.*worker20' openqa* 
 openqa.20.xz:[2022-12-09T05:59:37.811478Z] [error] [pid:17014] Failed dispatching message to websocket server over ipc for worker "openqaworker20:": Inactivity timeout at /usr/share/openqa/script/../lib/OpenQA/WebSockets/Client.pm line 40. 
 ``` 

 ## Acceptance criteria 
 * **AC1:** TBD 

 ## Suggestions 
 * Lorem ipsum dolor sit amet? 
 * Verify that the heartbeat messages are related to `worker20` 
 * Improve the logs to make it clear what `Worker 12345` means

Back