Project

General

Profile

Actions

coordination #111929

open

coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

[epic] Stable multi-machine tests covering multiple physical workers

Added by okurz almost 2 years ago. Updated about 8 hours ago.

Status:
Blocked
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-06-03
Due date:
% Done:

64%

Estimated time:
(Total: 0.00 h)

Description

Motivation

openQA supports multi-machine tests even covering multiple physical workers but we never could really ensure or know exactly what are necessary requirements to provide a stable test environment. We should ensure that we have stable multi-machine tests covering multiple physical workers.


Subtasks 29 (11 open18 closed)

action #111908: Multimachine failures between multiple physical workersNew2022-06-03

Actions
action #112001: [timeboxed:20h][spike solution] Pin multi-machine cluster jobs to same openQA worker host based on configurationNew2022-06-03

Actions
openQA Infrastructure - action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retryResolvednicksinger2023-08-15

Actions
action #135035: Optionally restrict multimachine jobs to a single workerResolvedmkittler2023-09-01

Actions
action #135914: Extend/add initial validation steps and "best practices" for multi-machine test setup/debugging to openQA documentation size:MResolvedmkittler

Actions
openQA Infrastructure - action #135944: Implement a constantly running monitoring/debugging VM for the multi-machine networkNew2023-09-18

Actions
action #136013: Ensure IP forwarding is persistent for multi-machine tests also in our salt recipes size:MResolveddheidler

Actions
openQA Infrastructure - action #137771: Configure o3 ppc64le multi-machine worker size:MResolvedmkittler2023-10-11

Actions
action #138698: significant increase in multi-machine test failures on OSD since 2023-10-25, e.g. test fails in support_server/setup size:MResolvedmkittler2023-10-27

Actions
action #139136: Conduct "lessons learned" with Five Why analysis for "test fails in iscsi_client due to salt 'host'/'nodename' confusion" size:MResolvedokurz

Actions
openQA Infrastructure - action #150869: Ensure multi-machine tests work on aarch64-o3 (or another but single machine only) size:MBlockedmkittler

Actions
action #151310: [regression] significant increase of parallel_failed+failed since 2023-11-21 size:MResolvedmkittler2023-11-23

Actions
openQA Infrastructure - action #152092: Handle all package downgrades in OSD infrastructure properly in salt size:MResolvednicksinger2023-12-05

Actions
openQA Infrastructure - action #152095: [spike solution][timeboxed:8h] Ping over GRE tunnels and TAP devices and openvswitch outside a VM with differing packet sizes size:SResolvedjbaier_cz2023-12-05

Actions
openQA Infrastructure - action #152098: [research][timeboxed:10h] Learn more about openvswitch with experimenting together size:SWorkable2023-12-05

Actions
openQA Infrastructure - action #152101: Allow salt to properly configure non-production multi-machine workers size:MResolvedmkittler2023-12-05

Actions
action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:MResolvedmkittler2023-12-11

Actions
openQA Infrastructure - action #152557: unexpected routing between PRG1/NUE2+PRG2Resolvedokurz

Actions
action #152737: Support for triggering parallel (multi-machine-)tests within a configured zone or locationNew

Actions
action #153769: Better handle changes in GRE tunnel configuration size:MResolvedokurz2024-01-17

Actions
action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.deResolvedmkittler2024-01-30

Actions
openQA Infrastructure - action #154624: Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:MResolvedjbaier_cz2024-01-30

Actions
openQA Infrastructure - action #155200: Periodically running simple ping-check multi-machine tests on ppc64le covering multiple physical hosts on OSD alerting tools team on failures size:MWorkable2024-01-30

Actions
openQA Infrastructure - action #155929: Try out rstp_enable=True in openqa/openvswitch.sls size:MResolveddheidler

Actions
action #157534: Multi-Machine Job fails in suseconnect_scc due to worker class misconfiguration when we introduced prg2e machinesResolvedokurz2024-03-19

Actions
openQA Infrastructure - action #157606: Prevent missing gre tunnel connections in our salt states due to misconfigurationNew2024-03-19

Actions
openQA Infrastructure - action #157738: Use rstp_enable=True on o3 as wellNew

Actions
action #158143: Make workers unassign/reject/incomplete jobs when across-host multimachine setup is requested but not availableNew

Actions
action #158146: Prevent scheduling across-host multimachine clusters to hosts that are marked to exclude themselvesNew2024-03-27

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Target version changed from Ready to future
Actions #2

Updated by okurz over 1 year ago

  • Parent task changed from #103962 to #112862

Move future ideas to the actual "Future ideas" tracker #112862

Actions #3

Updated by okurz 5 months ago

  • Subtask #134282 added
Actions #4

Updated by okurz 5 months ago

  • Subtask #135914 added
Actions #5

Updated by okurz 5 months ago

  • Subtask #136013 added
Actions #6

Updated by okurz 5 months ago

  • Subtask #135944 added
Actions #7

Updated by okurz 5 months ago

  • Subtask #137771 added
Actions #8

Updated by okurz 5 months ago

  • Subtask #151310 added
Actions #9

Updated by okurz 5 months ago

  • Subtask #138698 added
Actions #10

Updated by okurz 5 months ago

  • Subtask #139136 added
Actions #11

Updated by okurz 5 months ago

  • Subtask #152092 added
Actions #12

Updated by okurz 5 months ago

  • Subtask #152095 added
Actions #13

Updated by okurz 5 months ago

  • Subtask #152098 added
Actions #14

Updated by okurz 5 months ago

  • Subtask #152101 added
Actions #15

Updated by okurz 5 months ago

  • Related to action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M added
Actions #16

Updated by okurz 5 months ago

  • Related to deleted (action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M)
Actions #17

Updated by okurz 5 months ago

  • Subtask #152389 added
Actions #18

Updated by okurz 5 months ago

  • Subtask #135035 added
Actions #19

Updated by okurz 5 months ago

  • Subtask #152557 added
Actions #20

Updated by okurz 4 months ago

  • Subtask #152737 added
Actions #21

Updated by okurz 3 months ago

  • Subtask #153769 added
Actions #22

Updated by okurz 3 months ago

  • Subtask #153880 added
Actions #23

Updated by okurz 3 months ago

  • Subtask deleted (#153880)
Actions #24

Updated by okurz 3 months ago

  • Subtask #154552 added
Actions #25

Updated by okurz 3 months ago

  • Subtask #154624 added
Actions #26

Updated by okurz 3 months ago

  • Subtask #155200 added
Actions #27

Updated by okurz 2 months ago

  • Subtask #155926 added
Actions #28

Updated by okurz 2 months ago

  • Subtask #155929 added
Actions #29

Updated by okurz 2 months ago

  • Subtask deleted (#155926)
Actions #30

Updated by okurz about 1 month ago

  • Subtask #157534 added
Actions #31

Updated by okurz about 1 month ago

  • Subtask #157606 added
Actions #32

Updated by okurz about 1 month ago

  • Subtask #157738 added
Actions #33

Updated by okurz about 1 month ago

  • Subtask #158143 added
Actions #34

Updated by okurz about 1 month ago

  • Subtask #158146 added
Actions #35

Updated by okurz about 8 hours ago

  • Subtask #150869 added
Actions

Also available in: Atom PDF