action #115580: Reason: abandoned: associated worker openqaworker3:13 re-connected but abandoned the job size:M - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #115580

closed

Reason: abandoned: associated worker openqaworker3:13 re-connected but abandoned the job size:M

Added by coolgw over 2 years ago. Updated over 2 years ago.

Status:

Closed

Priority:

Normal

Assignee:

mkittler

Category:

Target version:

openQA Project (public) - Ready

Start date:

2022-08-22

Due date:

% Done:

Estimated time:

Description

Observation¶

Detail please check following job:
https://openqa.suse.de/tests/9363998#
After rerun the job, the issue not exist.

Suggestions¶

Check if there's any logs on the worker e.g. maybe systemd killed the service because it took too long
The same job also sometimes finishes with reason: timeout exceeded
Maybe the test is often taking too long so it can't finish in time

Actions

Copy link

Updated by coolgw over 2 years ago

Project changed from openQA Project (public) to openQA Infrastructure (public)

Actions

Copy link

Updated by livdywan over 2 years ago

Was this job missing the logs from the start? I can only see the iso and the qcow2. They might've been deleted because you didn't add the ticket earlier, which would have made it important (which makes openQA keep around assets longer).

I can find exactly one occurance, other jobs seem to finish with softfailed or timeout, I assume the former is what's considered good here. It would be helpful to confirm if this is reproducible, and if it happens on workers other than openqaworker3.

#96710 used to be an issue that was causing jobs to fail with the same reason, and might be worth considering here, although it's quite old at this point. We could still be seeing a new and completely unrelated issue.

Actions

Copy link

Updated by coolgw over 2 years ago

The error happen on weekend so i didn't add ticket on time.
I suppose this is sporadic issue since not happen after clone the case.
Will keep an eye on this issue.

Actions

Copy link

Updated by tinita over 2 years ago

Target version set to Ready

Actions

Copy link

Updated by livdywan over 2 years ago

Subject changed from Reason: abandoned: associated worker openqaworker3:13 re-connected but abandoned the job to Reason: abandoned: associated worker openqaworker3:13 re-connected but abandoned the job size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by mkittler over 2 years ago

Status changed from Workable to Closed
Assignee set to mkittler

The error means that the worker did not exit normally. That can have various reasons. It could be a bug in the worker code but it could also be the physical machine crashing, someone sending a SIGKILL (e.g. when stopping/restarting the systemd service and it ran into the timeout), a kernel panic, …. Without logs it is impossible to tell what happened. Unfortunately its the problem's nature that logs haven't been uploaded. Normally one can just have a look at the journal of the worker. In this case it is impossible tough because the oldest message is from 07.09. I also haven't found any more recent occurrences when checking Next & Previous jobs.

The same job also sometimes finishes with reason: timeout exceeded

Ok, that's a different kind of error then. Without even a job URL it is impossible to investigate the underlying problem. (I checked out more recent jobs, e.g. https://openqa.suse.de/tests/9492203. However, it looks like the SUT or test code simply gets stuck. I don't think there's something to improve here from the openQA side. It is completely normal that the job eventually ends up exceeding the timeout in that case.)

I don't think we can do anything about it at this point.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #115580

Reason: abandoned: associated worker openqaworker3:13 re-connected but abandoned the job size:M

Observation¶

Suggestions¶

Updated by coolgw over 2 years ago

Updated by livdywan over 2 years ago

Updated by coolgw over 2 years ago

Updated by tinita over 2 years ago

Updated by livdywan over 2 years ago

Updated by mkittler over 2 years ago