Project

General

Profile

Actions

action #136154

closed

coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

multimachine tests restarted by RETRY test variable end up without the proper dependency size:M

Added by szarate about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

I started noticing multiple jobs that are MM missing one or more dependencies:

Normally this job is a MM one, with two jobs https://openqa.suse.de/tests/12210430 -> should look like https://openqa.suse.de/tests/12207579#dependencies

In this case, the RETRY=1 makes the situation worse, causing blocked updates, due to jobs that should not have ever been restarted automatically, see https://openqa.suse.de/tests/12207609

Suggestions

  • Find a reproducing scenario with multi-machine clusters using RETRY=1
  • Create a simple MM cluster locally (maybe within unit tests are by adjusting the local database manually) and invoke the code that is done on an automatic retry (via RETRY=…), e.g. in t/10-jobs.t where we already use RETRY and take a look into t/05-scheduler-dependencies.t
  • Only then solve this problem in a mob session since only Marius is currently aware of how to do it

Related issues 2 (0 open2 closed)

Related to openQA Project (public) - action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:MResolvedmkittler2023-12-11

Actions
Copied from openQA Project (public) - action #80264: multimachine tests unable to get vars from its pair jobResolvedmkittler2020-11-24

Actions
Actions

Also available in: Atom PDF