Project

General

Profile

Actions

action #121771

closed

openqaworker20 has no heartbeat

Added by livdywan over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

from openqa_logwarn email

[2022-12-09T05:56:20.940749Z] [error] Worker 17014 has no heartbeat (400 seconds), restarting
[2022-12-09T05:56:28.358564Z] [error] Worker 20315 has no heartbeat (400 seconds), restarting
[2022-12-09T05:57:15.185728Z] [error] Worker 12757 has no heartbeat (400 seconds), restarting
[2022-12-09T05:59:37.811478Z] [error] [pid:17014] Failed dispatching message to websocket server over ipc for worker "openqaworker20:": Inactivity timeout at /usr/share/openqa/script/../lib/OpenQA/WebSockets/Client.pm line 40.

on 2022-12-19 okurz looked for more recent occurences and found only:

ariel:/var/log # xzgrep 'Failed dispatching.*worker20' openqa*
openqa.20.xz:[2022-12-09T05:59:37.811478Z] [error] [pid:17014] Failed dispatching message to websocket server over ipc for worker "openqaworker20:": Inactivity timeout at /usr/share/openqa/script/../lib/OpenQA/WebSockets/Client.pm line 40.

Acceptance criteria

  • AC1: TBD

Suggestions

  • Lorem ipsum dolor sit amet?
  • Verify that the heartbeat messages are related to worker20
  • Improve the logs to make it clear what Worker 12345 means

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #115418: Setup ow19+20 to be able to run MM tests size:MResolvedfavogt2022-08-17

Actions
Related to openQA Project - action #128345: [logwarn] Worker 30538 has no heartbeat (400 seconds), restarting size:MResolvedkraih2023-04-272023-05-20

Actions
Actions #1

Updated by livdywan over 1 year ago

  • Due date deleted (2022-12-16)
  • Assignee deleted (mkittler)
  • Start date deleted (2022-11-10)
  • Parent task deleted (#107062)
Actions #2

Updated by okurz over 1 year ago

  • Related to action #115418: Setup ow19+20 to be able to run MM tests size:M added
Actions #3

Updated by okurz over 1 year ago

  • Description updated (diff)
  • Status changed from New to Resolved
  • Assignee set to okurz

Seems like this was solved implicitly within #115418

Actions #4

Updated by okurz 12 months ago

  • Related to action #128345: [logwarn] Worker 30538 has no heartbeat (400 seconds), restarting size:M added
Actions

Also available in: Atom PDF