Project

General

Profile

Actions

action #128345

closed

[logwarn] Worker 30538 has no heartbeat (400 seconds), restarting size:M

Added by livdywan about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-04-27
Due date:
2023-05-20
% Done:

0%

Estimated time:

Description

Observation

from OpenQA logreport for ariel.suse-dmz.opensuse.org:

[2023-04-27T07:29:38.886066Z] [error] Worker 30538 has no heartbeat (400 seconds), restarting

Acceptance criteria

  • AC1: Heart beat messages are not observed in log reports

Suggestions

  • "Worker $PID" refers to the process on the web UI since this is where we use Mojo workers (prefork HTTP server) and nowhere else
  • Heartbeats are a recurring write of a small message from the prefork workers to the manager every few seconds, this must have been blocked by a blocking operation (like a syscall or database query)
  • Check the WebUI logs for relevant messages

Related issues 3 (1 open2 closed)

Related to openQA Infrastructure - action #121771: openqaworker20 has no heartbeatResolvedokurz

Actions
Related to openQA Project - action #106759: Worker xyz has no heartbeat (400 seconds), restarting repeatedly reported on o3 size:MResolvedlivdywan2022-02-03

Actions
Related to openQA Project - action #128936: API endpoint /api/v1/jobs/:jobid/set_done can be very slowNew2023-05-08

Actions
Actions

Also available in: Atom PDF