Project

General

Profile

Actions

coordination #112862

open

[saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

Added by okurz over 1 year ago. Updated about 8 hours ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2022-04-29
Due date:
2024-04-02 (Due in 14 days)
% Done:

17%

Estimated time:
(Total: 0.00 h)

Description

Ideas

  • Restart button on every job in the "dependency" view on cluster jobs
  • Preview in "dependency" of which jobs in a cluster would be restarted by an according restart option, e.g. restart of current job, advanced restart options, restart button on any other job

Subtasks 27 (12 open15 closed)

coordination #110458: [epic] Improve `RETRY=…`-behavior for jobs with dependenciesNew2022-04-29

Actions
coordination #111929: [epic] Stable multi-machine tests covering multiple physical workersBlockedokurz2022-06-032024-04-02

Actions
action #111908: Multimachine failures between multiple physical workersNew2022-06-03

Actions
action #112001: [timeboxed:20h][spike solution] Pin multi-machine cluster jobs to same openQA worker host based on configurationNew2022-06-03

Actions
openQA Infrastructure - action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retryResolvednicksinger2023-08-15

Actions
action #135035: Optionally restrict multimachine jobs to a single workerIn Progressmkittler2023-09-012024-04-02

Actions
action #135914: Extend/add initial validation steps and "best practices" for multi-machine test setup/debugging to openQA documentation size:MResolvedmkittler

Actions
openQA Infrastructure - action #135944: Implement a constantly running monitoring/debugging VM for the multi-machine networkNew2023-09-18

Actions
action #136013: Ensure IP forwarding is persistent for multi-machine tests also in our salt recipes size:MResolveddheidler

Actions
openQA Infrastructure - action #137771: Configure o3 ppc64le multi-machine worker size:MResolvedmkittler2023-10-11

Actions
action #138698: significant increase in multi-machine test failures on OSD since 2023-10-25, e.g. test fails in support_server/setup size:MResolvedmkittler2023-10-27

Actions
action #139136: Conduct "lessons learned" with Five Why analysis for "test fails in iscsi_client due to salt 'host'/'nodename' confusion" size:MResolvedokurz

Actions
action #151310: [regression] significant increase of parallel_failed+failed since 2023-11-21 size:MResolvedmkittler2023-11-23

Actions
openQA Infrastructure - action #152092: Handle all package downgrades in OSD infrastructure properly in salt size:MResolvednicksinger2023-12-05

Actions
openQA Infrastructure - action #152095: [spike solution][timeboxed:8h] Ping over GRE tunnels and TAP devices and openvswitch outside a VM with differing packet sizes size:SResolvedjbaier_cz2023-12-05

Actions
openQA Infrastructure - action #152098: [research][timeboxed:10h] Learn more about openvswitch with experimenting together size:SWorkable2023-12-05

Actions
openQA Infrastructure - action #152101: Allow salt to properly configure non-production multi-machine workers size:MWorkable2023-12-05

Actions
action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:MResolvedmkittler2023-12-11

Actions
openQA Infrastructure - action #152557: unexpected routing between PRG1/NUE2+PRG2Resolvedokurz

Actions
action #152737: Support for triggering parallel (multi-machine-)tests within a configured zone or locationNew

Actions
action #153769: Better handle changes in GRE tunnel configuration size:MResolvedokurz2024-01-17

Actions
action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.deResolvedmkittler2024-01-30

Actions
openQA Infrastructure - action #154624: Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:MResolvedjbaier_cz2024-01-30

Actions
openQA Infrastructure - action #155200: Periodically running simple ping-check multi-machine tests on ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:MWorkable2024-01-30

Actions
openQA Infrastructure - action #155929: Try out rstp_enable=True in openqa/openvswitch.sls size:MResolveddheidler

Actions
action #112256: Some children of parent job not cancelled (or later, restarted) when parent `parallel_failed` due to another child's parallel job failingNew2022-06-09

Actions
action #112868: Helpful instructions to prevent incomplete cluster restartsNew2022-06-22

Actions

Related issues 1 (0 open1 closed)

Copied from openQA Project - coordination #103962: [saga][epic] Easy multi-machine handling: MM-tests as first-class citizensResolvedmkittler2019-11-18

Actions
Actions #1

Updated by okurz over 1 year ago

  • Copied from coordination #103962: [saga][epic] Easy multi-machine handling: MM-tests as first-class citizens added
Actions #2

Updated by okurz over 1 year ago

  • Subject changed from [saga][epic] Easy multi-machine handling: MM-tests as first-class citizens to [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens
Actions

Also available in: Atom PDF