Project

General

Profile

Actions

action #112868

open

coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

Helpful instructions to prevent incomplete cluster restarts

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2022-06-22
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In a case like
https://openqa.suse.de/tests/8966763#dependencies
a job is not passed so users might like to restart. Trying to retrigger over the button in the webUI shows an error

Errors occurred when restarting jobs:

    Job 8966755 already has clone 8998406

First an inconvenience is that just the job IDs are shown but no links are rendered. Second, the user would still like to restart the job but can't. In the above example 8966755 is the serial parent "create_hdd_ha_textmode_maintenance" which already has a clone 8998406 which likely was created when a job in another sub-cluster was retriggered

Suggestions

Further details

See https://suse.slack.com/archives/C02CANHLANP/p1655887247175179 for details

Workaround

  1. To avoid this problem retrigger the serial parent for multiple sub-clusters to achieve consistent results
  2. To fix the situation if already an incomplete cluster was created delete the serial parent job which prevents cloning of the original failed job and restart the serial parent of the complete cluster (instead of any child job)
Actions

Also available in: Atom PDF