Project

General

Profile

Actions

action #152569

closed

Many incomplete jobs endlessly restarted over several weeks size:M

Added by tinita 12 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2023-12-13
Due date:
2024-01-12
% Done:

0%

Estimated time:

Description

Observation

When investigating #152560 we noticed that there are also a lot of restarted incomplete jobs like this one:
https://openqa.suse.de/tests/13062217

Reason: backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.org:5901>: IO::Socket::INET: connect: Connection refused

Apparently there is an auto_clone_regex feature that will restart a job directly in openQA if the reason matches a certain regex.

But it doesn't make sense to restart the job thousands of times. I couldn't even find the original job (haven't tried the recursion feature yet).

In total I could find over 17k jobs with that error about unreal6.qe.nue2.suse.org since mid november.

A symptom of having such huge restart/clone-chains is:

Dec 04 14:39:53 openqa openqa-gru[6326]: Deep recursion on subroutine "OpenQA::Schema::Result::Jobs::related_scheduled_product_id" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 2016.

Acceptance Criteria

  • AC1: Incomplete jobs are restarted up to n times at most (configurable)

Suggestions

  • Implement a cap/limit on the automatic restarting of incomplete jobs
  • Search for auto_clone_regex in the code repository to find the relevant starting point
  • Have a look into avoiding the deep recursion as well

Related issues 3 (0 open3 closed)

Related to openQA Infrastructure (public) - action #152578: Many incompletes with "Error connecting to VNC server <unreal6.qe.nue2.suse.org:...>" size:MResolvedtinita2023-12-13

Actions
Related to openQA Project (public) - action #152560: [alert] Incomplete jobs (not restarted) of last 24h alert SaltResolvedtinita2023-12-13

Actions
Copied to openQA Project (public) - action #153475: Reconsider the formatting of variable-names in the reason field, e.g. "$auto_clone_regex" size:SResolvedmkittler

Actions
Actions

Also available in: Atom PDF