action #138698
closedcoordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens
coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers
significant increase in multi-machine test failures on OSD since 2023-10-25, e.g. test fails in support_server/setup size:M
Description
Observation¶
openQA test in scenario sle-15-SP5-Server-DVD-HA-Incidents-x86_64-qam_ha_priorityfencing_supportserver@64bit fails in
setup
Test suite description¶
The base test suite is used for job templates defined in YAML documents. It has no settings of its own.
Reproducible¶
Not easily reproducible. Failure is sporadic. See Next & Previous Results tab in linked test.
Failed on (at least) Build :29290:libfido2 (current job)
Expected result¶
Last good: :29978:qemu (or more recent)
Acceptance criteria¶
- AC1: qam_ha_priorityfencing_supportserver scenario passes reliably
- AC2: Unrelated issues are identified and tracked as individual issues
Problem¶
- H1 The product has changed, unclear as of #138698-5
H2 Fails because of changes in test setup
- H2.1 worker3[3-6] are problematic
- E2.1-1 Disable worker3[3-6] and test if https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=now-7d&to=now&viewPanel=24 improves again towards lower fail-ratio, see #138698-3
- E2.1-2 Disable worker3{3-4} https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/666 Has led to a good passing rate
- E2.1-3 Apparently 34 is indeed broken Disable worker33 only https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/667 -> #139070
- E2.1-3 Apparently 33 is indeed broken Verify that worker 33 is fine https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/669 -> #139154
H3 REJECTED Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA -> O3-1 #138698-6 -> reject
H4 REJECTED Fails because of changes in test management configuration, e.g. openQA database settings -> O4-1 #138698-5 -> reject
H5 REJECTED Fails because of changes in the test software itself (the test plan in source code as well as needles) -> O5-1 #138698-5 -> reject
H6 REJECTED Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time -> O6-1 https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1698050125250&to=1698433324349&viewPanel=24 shows significant increase in ratio of failed+parallel_failed 2023-10-25 4+7=11% to 2023-10-27 15+32=47% -> reject but test code is racy as of #138698-5
Suggestions¶
- Consider temporarily disabling GRE tunnel use again completely, i.e. only run multi-machine tests again on a single host - see #135035
- Temporarily disable tap class from more and more workers trying to narrow down or identify the culprit
- host to host communication seems to be not stable.
- Run openQA tests as well as more low-level tests
- the iscsi-server+client test scenario
- http://open.qa/docs/#_debugging_open_vswitch_configuration
- With all mitigations monitor the impact on job queue to prevent overload and too long job queues, e.g. see https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=now-7d&to=now
Further details¶
Always latest result in this scenario: latest
Rollback steps¶
Put back tap on 33/34 https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/666- Note: See related, specific tickets for other rollback steps previously mentioned here
Updated by acarvajal 12 months ago
I think this issue could be caused by a configuration problem in the worker or in the network. As discussed in #eng-testing, we have been seeing many failures since October 26th, but not a clear pattern can be easily discerned.
Failures are:
Failures in support_server/setup as this one. So far these seem to be happening on worker34 (see linked job) and also https://openqa.suse.de/tests/12692647, and in worker33 (https://openqa.suse.de/tests/12692668), so I suggest starting the debug in those 2 workers.
Failures in SUTs trying to reach 10.0.2.2. We have seen those in:
worker33: https://openqa.suse.de/tests/12692960#step/barrier_init/5
worker34: https://openqa.suse.de/tests/12693210#step/ha_cluster_init/4
worker32: https://openqa.suse.de/tests/12692616#step/barrier_init/5
- Name solving issues (using the DNS provided by the support server):
worker34: https://openqa.suse.de/tests/12693209#step/iscsi_client/15 (support server in worker35)
worker39: https://openqa.suse.de/tests/12690359#step/ha_cluster_join/6 (support server in worker30)
worker33: https://openqa.suse.de/tests/12691203#step/ha_cluster_join/6 (support server in worker37)
worker35: https://openqa.suse.de/tests/12692184#step/ha_cluster_init/3 (support server in worker32)
worker29: https://openqa.suse.de/tests/12692221#step/cluster_md/2 (support server in worker32)
worker32: https://openqa.suse.de/tests/12692201#step/iscsi_client/15 (support server in worker30)
worker30: https://openqa.suse.de/tests/12692313#step/ha_cluster_join/6 (support server in worker33)
worker33: https://openqa.suse.de/tests/12692311#step/ha_cluster_join/6 (support server in worker39)
worker33: https://openqa.suse.de/tests/12692339#step/iscsi_client/15 (support server in worker35)
worker34: https://openqa.suse.de/tests/12692372#step/ha_cluster_join/6 (support server in worker29)
worker34: https://openqa.suse.de/tests/12692380#step/ha_cluster_init/23 (support server in worker39)
worker32: https://openqa.suse.de/tests/12692386#step/ha_cluster_join/6 (support server in worker30)
Will edit later and add the other cases we're seeing.
Edit:
- Connection issues between nodes in a MM setup:
worker32: https://openqa.suse.de/tests/12692186#step/hawk_gui/29 (support server in worker35, node1 in worker39)
worker34: https://openqa.suse.de/tests/12692192#step/ha_cluster_join/15 (support server in worker33, node1 in worker33)
worker34: https://openqa.suse.de/tests/12692233#step/remove_node/23 (support server in worker29, node2 in worker30)
worker32: https://openqa.suse.de/tests/12692332#step/hawk_gui/36 (support server in worker35, node2 in worker34)
- Cluster resources stopped. This is an odd one, but I guess communication issues between the cluster can make the DC stop resources in the remote node:
https://openqa.suse.de/tests/12693181#step/check_cluster_integrity/6 (job in worker40, other jobs in worker33 & worker30)
https://openqa.suse.de/tests/12692558#step/check_cluster_integrity/6 (job in worker38, other jobs in worker32 & worker40)
- Other connection issues:
worker32: https://openqa.suse.de/tests/12692266#step/register_without_ltss/53 (support server in worker30)
- And finally, the most common error, are these random reboots. They're happening in multiple modules. I think what we're seeing here is that there is some connection problem between the cluster nodes, and the HA stack fences the node. Due to the nature of the failure, jobs are not leaving logs for us to check (failing node is in grub, so
post_fail_hook
can not gather logs, and the other nodes finish withparallel_failed
sopost_fail_hook
does not run:
https://openqa.suse.de/tests/12693179#step/check_hawk/10 (job in worker34, other jobs in worker40 & worker33)
https://openqa.suse.de/tests/12693197#step/check_cluster_integrity/2 (job in worker33, other jobs in worker32 & worker33)
https://openqa.suse.de/tests/12693199#step/console_reboot#1/17 (job in worker34, other jobs in worker29 & worker36)
https://openqa.suse.de/tests/12684453#step/cluster_md/2 (job in worker32, other jobs in worker39, worker29, worker33 & worker38)
https://openqa.suse.de/tests/12684536#step/vg/9 (job in worker33, other jobs in worker35 & worker40)
https://openqa.suse.de/tests/12684540#step/clvmd_lvmlockd/19 (job in worker33, other jobs in worker32, worker35 & worker40)
https://openqa.suse.de/tests/12684548#step/check_after_reboot/3 (job in worker40, other jobs in worker33, worker35 & worker40)
https://openqa.suse.de/tests/12692787#step/drbd_passive/30 (job in worker33, other jobs in worker37 & worker34)
https://openqa.suse.de/tests/12697964#step/clvmd_lvmlockd/26 (job in worker34, other jobs in worker29 & worker36)
https://openqa.suse.de/tests/12697960#step/cluster_state_mgmt/13 (job in worker34, other jobs in worker30 & worker34)
https://openqa.suse.de/tests/12697957#step/dlm/11 (job in worker33, other jobs in worker33 & worker30)
https://openqa.suse.de/tests/12697948#step/check_after_reboot/33 (job in worker39, other jobs in worker32, worker39 & worker38)
https://openqa.suse.de/tests/12700249#step/filesystem#1/15 (job in worker32, other jobs in worker30 & worker39)
https://openqa.suse.de/tests/12700243#step/drbd_passive/10 (job in worker32, other jobs in worker29 & worker32)
https://openqa.suse.de/tests/12686893#step/cluster_md/14 (job in worker34, other jobs in worker35, worker40, worker36 & worker37)
https://openqa.suse.de/tests/12692179#step/dlm/11 (job in worker34, other jobs in worker30, worker40 & worker38)
https://openqa.suse.de/tests/12692204#step/vg/6 (job in worker33, other jobs in worker34 & worker38)
https://openqa.suse.de/tests/12692217#step/check_after_reboot/19 (job in worker30, other jobs in worker32, worker40 & worker38)
https://openqa.suse.de/tests/12692260#step/check_after_reboot/12 (job in worker33, other jobs in worker33, worker34 & worker35)
https://openqa.suse.de/tests/12692255#step/check_after_reboot/7 (job in worker40, other jobs in worker32, worker29 & worker36)
https://openqa.suse.de/tests/12692269#step/check_logs/11 (job in worker32, other jobs in worker29 & worker38)
https://openqa.suse.de/tests/12692284#step/cluster_md/20 (job in worker33, other jobs in worker34, worker40 & worker38)
https://openqa.suse.de/tests/12692291#step/clvmd_lvmlockd/9 (job in worker34, other jobs in worker30 & worker33)
https://openqa.suse.de/tests/12692319#step/ha_cluster_join/16 (job in worker32, other jobs in worker39, worker38 & worker33)
https://openqa.suse.de/tests/12692306#step/cluster_md/6 (job in worker34, other jobs in worker30 & worker37)
https://openqa.suse.de/tests/12692357#step/dlm/10 (job in worker34, other jobs in worker36, worker40 & worker37)
https://openqa.suse.de/tests/12692375#step/cluster_md/6 (job in worker33, other jobs in worker29 & worker36)
https://openqa.suse.de/tests/12692414#step/filesystem/19 (job in worker32, other jobs in worker35, worker39 & worker30)
https://openqa.suse.de/tests/12692401#step/ha_cluster_init/54 (job in worker32, other jobs in worker37 & worker30)
https://openqa.suse.de/tests/12692397#step/ha_cluster_init/27 (job in worker32, other jobs in worker33, worker34 & worker32)
As reported, not a clear pattern, but workers 32, 33 & 34 seem to appear a lot in these failures.
Updated by okurz 12 months ago
- Related to action #133700: Network bandwidth graphs per switch, like https://mrtg.suse.de/qanet13nue, for all current top-of-rack switches (TORs) that we are connected to size:M added
Updated by okurz 12 months ago
- Project changed from openQA Infrastructure to openQA Project
- Category set to Regressions/Crashes
- Status changed from New to In Progress
- Assignee set to okurz
- Target version set to Ready
ok, so let's follow the hypothesis that this is worker host specific. https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=now-7d&to=now&viewPanel=27 indeed shows w32+33+34 with a higher failure rate but actually also w31+w35+w36. As right now at least for qemu-x86_64 we have a high redundancy we can easily take some machines out of production temporarily to check such hypothesis. So doing
sudo salt 'worker3[1-6].oqa.*' cmd.run "sudo systemctl disable --now telegraf \$(systemctl list-units | grep openqa-worker-auto-restart | cut -d . -f 1 | xargs)" && for i in {31..36}; do sudo salt-key -y -d worker$i.oqa.prg2.suse.org; done
although I consider it unlikely that those and only those machines are affected. Either it's something not worker specific or maybe a problem in the network. Bandwidth graphs like requested in #133700 would obviously help.
@acarvajal I will merely address one unlikely hypothesis in particular due to lower overall expected system load during the weekend. On top I will only again be fully working 2023-11-02, not before. IMHO the impact is severe enough to make that ticket at least "High" if not "Urgent". If you want to help to adress this issue with more effort I suggest you look into a better reproducer, e.g. minimized test scenario with modules excluded that don't impact the error and modules which do have an impact repeated multiple times if possible.
Updated by okurz 12 months ago
@acarvajal support_server/setup is at least an example failing rather quickly in comparison but the test code is obviously far from race-free. The code has:
assert_screen 'iscsi-target-overview-add-target-tab';
# Wait for the Identifier field to change from 'test' value to the correct one
# We could simply use a 'sleep' here but it's less good
wait_screen_change(undef, 10);
# Select Target field
send_key 'alt-t';
wait_still_screen 3;
# Change Target value
for (1 .. 40) { send_key 'backspace'; }
type_string 'iqn.2016-02.de.openqa';
wait_still_screen 3;
# Select Identifier field
send_key 'alt-f';
wait_still_screen 3;
# Change Identifier value
for (1 .. 40) { send_key 'backspace'; }
wait_still_screen 3;
type_string '132';
wait_still_screen 3;
# Un-check Use Authentication
send_key 'alt-u';
wait_still_screen 3;
so lot's of waste-full and racy wait_still_screen
. I can not give you a guarantee that we would be able to fix that issue at all. Someone should improve that code. Could you take that into your scope in a separate ticket please?
Updated by okurz 12 months ago
- Subject changed from test fails in support_server/setup to significant increase in multi-machine test failures on OSD since 2023-10-25, e.g. test fails in support_server/setup
- Description updated (diff)
For H3 https://openqa.suse.de/tests/12691358#investigation shows
diff_to_last_good
- "BASE_TEST_ISSUES" : "29978",
+ "BASE_TEST_ISSUES" : "29290",
- "BUILD" : ":29978:qemu",
+ "BUILD" : ":29290:libfido2",
- "INCIDENT_ID" : "29978",
- "INCIDENT_REPO" : "http://download.suse.de/ibs/SUSE:/Maintenance:/29978/SUSE_Updates_SLE-Module-Basesystem_15-SP5_x86_64,http://download.suse.de/ibs/SUSE:/Maintenance:/29978/SUSE_Updates_SLE-Module-Server-Applications_15-SP5_x86_64",
+ "INCIDENT_ID" : "29290",
+ "INCIDENT_REPO" : "http://download.suse.de/ibs/SUSE:/Maintenance:/29290/SUSE_Updates_SLE-Module-Basesystem_15-SP5_x86_64",
- "NICMAC" : "52:54:00:12:0a:b7",
+ "NICMAC" : "52:54:00:12:0d:60",
- "NICVLAN" : "115",
+ "NICVLAN" : "137",
- "PRJDIR" : "/var/lib/openqa/cache/openqa.suse.de",
+ "PRJDIR" : "/var/lib/openqa/share",
- "QEMUPORT" : "20172",
+ "QEMUPORT" : "20462",
- "REPOHASH" : "1698322664",
+ "REPOHASH" : "1698310213",
- "RRID" : "SUSE:Maintenance:29978:311661",
+ "RRID" : "SUSE:Maintenance:29290:311633",
- "SERVERAPP_TEST_ISSUES" : "29978",
- "TAPDEV" : "tap16",
+ "TAPDEV" : "tap45",
- "VNC" : "107",
+ "VNC" : "136",
- "WORKER_CLASS" : "qemu_x86_64,qemu_x86_64_staging,qemu_x86_64-large-mem,amd,tap,prg,prg2,worker35,cpu-x86_64,cpu-x86_64-v2,cpu-x86_64-v3",
- "WORKER_HOSTNAME" : "worker35.oqa.prg2.suse.org",
- "WORKER_ID" : 2743,
- "WORKER_INSTANCE" : 17,
+ "WORKER_CLASS" : "qemu_x86_64,qemu_x86_64_staging,qemu_x86_64-large-mem,amd,tap,prg,prg2,worker34,cpu-x86_64,cpu-x86_64-v2,cpu-x86_64-v3",
+ "WORKER_HOSTNAME" : "worker34.oqa.prg2.suse.org",
+ "WORKER_ID" : 3424,
+ "WORKER_INSTANCE" : 46,
last_good 12686884
needles_diff_stat
needles_log
No needle changes recorded, test regression due to needles unlikely
test_diff_stat
test_log
No test changes recorded, test regression unlikely
so unlikely to have a problem due to os-autoinst-distri-opensuse changes or needles. Also no relevant test setting changes that would come from differing settings in the openQA database or job templates and such. However the product has changed and I can't rule out changes in there, e.g. SLE maintenance updates.
From first bad https://openqa.suse.de/tests/12691358/logfile?filename=autoinst-log.txt os-autoinst version is 4.6.1698238759.64b339c vs. last good https://openqa.suse.de/tests/12686884/logfile?filename=autoinst-log.txt 4.6.1698187055.10dd7a0
$ git log1 --no-merges 10dd7a0..64b339c
d438393f Use commit message checks from os-autoinst-common
c5986103 (okurz/feature/s390, feature/s390) backend::baseclass: Fix wording of informative message
fb8c1fed Slightly simplify backend::baseclass
54ad428c Remove unused tools/absolutize
the only commit remotely sounding like it could introduce functional changes is https://github.com/os-autoinst/os-autoinst/commit/fb8c1fed1a021354a62232f6579183c269a3d29b but the diff looks very unsuspicious
Also from https://openqa.suse.de/tests/12691358#comments we see that retry passes and qam_ha_priorityfencing_supportserver:investigate:last_good_build::29978:qemu: passed so another indication for sporadic issue not related to os-autoinst or openQA changes. Taking a look into worker34.oqa.prg2.suse.org:/var/log/zypp/history I see:
2023-10-25 02:12:57|command|root@worker34|'zypper' '-n' '--no-refresh' '--non-interactive-include-reboot-patches' 'patch' '--replacefiles' '--auto-agree-with-licenses' '--download-in-advance'|
2023-10-25 02:12:58|install|libruby2_5-2_5|2.5.9-150000.4.29.1|x86_64||repo-sle-update|3a958f3465e4eab4839b2863523e30d61bca64a982b4f05ca17dcfd656202b59|
2023-10-25 02:12:58|install|ruby2.5-stdlib|2.5.9-150000.4.29.1|x86_64||repo-sle-update|a76c4b98b007a31b26466734f0049d1efb467c2fdb8d8efaaaef586b7224c873|
2023-10-25 02:12:58|install|ruby2.5|2.5.9-150000.4.29.1|x86_64||repo-sle-update|6af225578dacb2ebef69532cd5de72134604821968a2761d321c527940f381ec|
2023-10-25 02:12:58|patch |openSUSE-SLE-15.5-2023-4176|1|noarch|repo-sle-update|important|security|needed|applied|
2023-10-25 07:14:51|command|root@worker34|'zypper' '--no-refresh' '-n' 'dup' '--replacefiles'|
2023-10-25 07:14:51|install|os-autoinst|4.6.1698187055.10dd7a0-lp155.1689.1|x86_64||devel_openQA|a3847ebfcabbb86c32dc00c423eb162f685a1c9d17dde7facdf68e8d392650a3|
2023-10-25 07:14:52|install|os-autoinst-devel|4.6.1698187055.10dd7a0-lp155.1689.1|x86_64||devel_openQA|b6e93750283ae620e55bb1f97ce43c60a22c3db60539d7f764efc843f0b2b581|
2023-10-25 07:14:52|install|os-autoinst-swtpm|4.6.1698187055.10dd7a0-lp155.1689.1|x86_64||devel_openQA|8ebe851440ee8fc34098e01b183197879237cd7f92e818648c35e8c5874aaa78|
2023-10-25 07:14:54|install|os-autoinst-openvswitch|4.6.1698187055.10dd7a0-lp155.1689.1|x86_64||devel_openQA|51a7ed5e6e4dca782384a17a461ca8b2acf055352e1b27c06b20a6db64c68cd0|
2023-10-25 07:14:54|install|openQA-common|4.6.1698152470.c944acc-lp155.6147.1|x86_64||devel_openQA|b8ba7ef9c7ad73a7712ed61d0fd505733267c9b071738bf1d6f1fb970e3b55aa|
2023-10-25 07:14:54|install|os-autoinst-distri-opensuse-deps|1.1698196593.7916f33b-lp155.13125.1|noarch||devel_openQA|7d17349c1cb5c983317959bc51df86bfcd5dd37500080995a7cdb62bbdf791a2|
2023-10-25 07:14:54|install|openQA-client|4.6.1698152470.c944acc-lp155.6147.1|x86_64||devel_openQA|0d2d629f8f406eed951c369bff7cd40ff67e212c17ee48b87a85abc0e79edfc4|
2023-10-25 07:14:56|install|openQA-worker|4.6.1698152470.c944acc-lp155.6147.1|x86_64||devel_openQA|f794647d29b0daef763e518533678eb949e3d4ccc7ba9913f0e640e28be77fa3|
2023-10-26 02:13:09|command|root@worker34|'zypper' '-n' '--no-refresh' '--non-interactive-include-reboot-patches' 'patch' '--replacefiles' '--auto-agree-with-licenses' '--download-in-advance'|
2023-10-26 02:13:10|install|libnghttp2-14|1.40.0-150200.12.1|x86_64||repo-sle-update|6625e233bc93d47e048dfc9d7a6df96a473a542f95815dae87f5c0db80dd532c|
2023-10-26 02:13:10|install|libssh2-1|1.11.0-150000.4.19.1|x86_64||repo-sle-update|291590f6d5e84f8ad50960aa756501984c9fff723159d3533577fcec5735aec6|
2023-10-26 02:13:10|patch |openSUSE-SLE-15.5-2023-4192|1|noarch|repo-sle-update|moderate|recommended|needed|applied|
2023-10-26 02:13:10|patch |openSUSE-SLE-15.5-2023-4200|1|noarch|repo-sle-update|important|security|needed|applied|
2023-10-27 02:13:15|command|root@worker34|'zypper' '-n' '--no-refresh' '--non-interactive-include-reboot-patches' 'patch' '--replacefiles' '--auto-agree-with-licenses' '--download-in-advance'|
2023-10-27 02:13:16|install|libz1|1.2.13-150500.4.3.1|x86_64||repo-sle-update|1f273509bd76f485a289e23791a3d9c5fec7b982fe91f59000d191d40375840d|
2023-10-27 02:13:16|install|zlib-devel|1.2.13-150500.4.3.1|x86_64||repo-sle-update|16b0c66f6384d2ed18894441075a928d263897e8fb1c0c496f9ee41f3a1c2411|
2023-10-27 02:13:16|patch |openSUSE-SLE-15.5-2023-4215|1|noarch|repo-sle-update|moderate|security|needed|applied|
2023-10-27 07:15:51|command|root@worker34|'zypper' '--no-refresh' '-n' 'dup' '--replacefiles'|
2023-10-27 07:15:51|install|os-autoinst|4.6.1698238759.64b339c-lp155.1693.1|x86_64||devel_openQA|d3acaf331ba15656171a8104292fe48edac5c155076a951d16b1bac8c4469827|
2023-10-27 07:15:51|install|os-autoinst-devel|4.6.1698238759.64b339c-lp155.1693.1|x86_64||devel_openQA|03ec7a9aef0b480cb0cf652f53dd2758b0ca52cebb17d6b2eda58ff79610845d|
2023-10-27 07:15:51|install|os-autoinst-swtpm|4.6.1698238759.64b339c-lp155.1693.1|x86_64||devel_openQA|b2716a8dcd6006b32d63f1baa8def8fae08afb8e7036a5f5e25a346eb716b729|
2023-10-27 07:15:53|install|os-autoinst-openvswitch|4.6.1698238759.64b339c-lp155.1693.1|x86_64||devel_openQA|092833173f63a5c0731f9e47e6255f89a6a36262310ad39b4628b000714e8635|
2023-10-27 07:15:53|install|openQA-common|4.6.1698238589.f8f5bc4-lp155.6149.1|x86_64||devel_openQA|9fcf9828ae242a645e6d9e6b6d2878d7b49cbba4007d962b3f80b7bfbbc94a6d|
2023-10-27 07:15:53|install|os-autoinst-distri-opensuse-deps|1.1698329766.7f036688-lp155.13144.1|noarch||devel_openQA|64d8e3e8e5257b4c86f3d04dcff5c6516ecbe8be5939b383848f91d08a862542|
2023-10-27 07:15:53|install|openQA-client|4.6.1698238589.f8f5bc4-lp155.6149.1|x86_64||devel_openQA|b82470bc7f49d28bf5017d70bb3a411627829ae46bf0e00bc431a7629da70d4c|
2023-10-27 07:15:58|install|openQA-worker|4.6.1698238589.f8f5bc4-lp155.6149.1|x86_64||devel_openQA|33428dde2cf8a24604c863dd4d2332ef1d99686807a351bd016fe7fe46f17d76|
2023-10-27 07:15:58|install|python3-cryptography|3.3.2-150400.20.3|x86_64||repo-sle-update|8d01db80914ea5adb8bfa4e7bd3ad7f8976aab6cc76859f1da5871870d8ca797|
2023-10-27 07:15:58|patch |openSUSE-SLE-15.5-2023-4194|1|noarch|repo-sle-update|low|feature|needed|applied|
with nothing suspiciuous unless we suspect the ruby security patch ;) For openQA a broader diff log would be
$ git log1 --no-merges d08787a..f8f5bc4
94d1adde3 Use commit message checks from os-autoinst-common
02be1c5aa Warn when modifying files under external directly
89463ff34 (okurz/feature/ci, feature/ci) CI: Use consistent casing in commit message check
f8e89a368 CI: Fix typo in github action name
80296b8c1 Update .github/workflows/commit_message_checker.yml
ab85100ef Update commit-message-checker & add extra rule for subject lines
so again far from suspicious
Updated by okurz 12 months ago
- Description updated (diff)
I can't disable all worker instances on worker31-36 as they run other stuff than just x86_64-qemu, e.g. s390x. So again enabling but luckily we have spread non-x86_64-qemu worker instances evenly so I can do
for i in {31..36}; do sudo salt-key -y -a worker$i.oqa.prg2.suse.org; done
sudo salt --no-color --state-output=changes 'worker*' state.apply | grep -av 'Result.*Clean'
sudo salt 'worker3[1-4].oqa.*' cmd.run "sudo systemctl mask --now openqa-worker-auto-restart@{11..50}"
so for now keeping w35+w36 enabled but masked all instances.
sudo salt 'worker3[1-9].oqa.*' cmd.run "sudo pgrep -af 'openqa.*worker'"
looks ok.
Experiment to derive fail-ratio if any with reference scenario "ovs-client+server" from #136013 parameterized by worker while staying with each cluster within one worker host
for i in {29..40}; do name=poo138698-okurz-w$i; openqa-clone-job --repeat=10 --skip-chained-deps --parental-inheritance --within-instance https://openqa.suse.de/tests/12691977 TEST+=-$name BUILD=$name _GROUP=0 WORKER_CLASS=worker$i,tap; done
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w29 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w30 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w31 SKIP: worker31 can not work, see #137756
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w32 12/20=60% passed -> 40% failed! All with seemingly network related problems, worker32 disabled, see #138707
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w33 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w34 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w35 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w36 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w37 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w38 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w39 20/20=100% passed
- https://openqa.suse.de/tests/overview?build=poo138698-okurz-w40 20/20=100% passed
w31-34 shouldn't actually start as I assume there is no class running like "worker31,tap" even though that is unsafe as I should include qemu-x86_64 to not run on s390x or something.
Updated by openqa_review 12 months ago
- Due date set to 2023-11-11
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 12 months ago
tests for w29+30,35+36+37+38+39+40 are 10/10 green so no problem there. https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=now-2d&to=now&viewPanel=24 shows a significant improvement, back to 10% failed+parallel_failed. I retriggered multiple tests, in particular HA/SAP SLE maintenance aggregate tests, now https://openqa.suse.de/tests/overview?groupid=405&flavor=SAP-DVD-Updates&flavor=Server-DVD-HA-Updates all good.
As no other tests reproduced the error and also it seems there are no more x86_64 qemu tests scheduled right now on OSD I re-enabled all worker instances. Now more of my debugging jobs can start but I can also try to reproduce the original problem.
$ name=poo138698-okurz; openqa-clone-job --skip-chained-deps --skip-deps --parental-inheritance --within-instance https://openqa.suse.de/tests/12691358 INCLUDE_MODULES=support_server/login,support_server/setup TEST+=-$name BUILD=$name _GROUP=0 WORKER_CLSS=qemu_x86_64,tap,worker31
Cloning parents of sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priorityfencing_supportserver@64bit
Cloning children of sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priorityfencing_supportserver@64bit
Cloning parents of sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priority_fencing_node01@64bit
Cloning parents of sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priority_fencing_node02@64bit
3 jobs have been created:
- sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priorityfencing_supportserver@64bit -> https://openqa.suse.de/tests/12710499
- sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priority_fencing_node01@64bit -> https://openqa.suse.de/tests/12710497
- sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priority_fencing_node02@64bit -> https://openqa.suse.de/tests/12710498
worker31 can not work, see #137756, trying w32
$ name=poo138698-okurz; openqa-clone-job --skip-chained-deps --skip-deps --parental-inheritance --within-instance https://openqa.suse.de/tests/12691358 INCLUDE_MODULES=support_server/login,support_server/setup TEST+=-$name BUILD=$name _GROUP=0 WORKER_CLASS=qemu_x86_64,tap,worker32
Cloning parents of sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priorityfencing_supportserver@64bit
Cloning children of sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priorityfencing_supportserver@64bit
Cloning parents of sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priority_fencing_node01@64bit
Cloning parents of sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priority_fencing_node02@64bit
3 jobs have been created:
- sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priorityfencing_supportserver@64bit -> https://openqa.suse.de/tests/12710502
- sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priority_fencing_node01@64bit -> https://openqa.suse.de/tests/12710501
- sle-15-SP5-Server-DVD-HA-Incidents-x86_64-Build:29290:libfido2-qam_ha_priority_fencing_node02@64bit -> https://openqa.suse.de/tests/12710500
that failed because with INCLUDE_MODULES
the parallel jobs have no tests at all, using EXCLUDE_MODULES:
name=poo138698-okurz; openqa-clone-job --skip-chained-deps --skip-deps --parental-inheritance --within-instance https://openqa.suse.de/tests/12691358 EXCLUDE_MODULES=ha/barrier_init,support_server/wait_children TEST+=-$name BUILD=$name _GROUP=0 WORKER_CLASS=qemu_x86_64,tap,worker32
-> https://openqa.suse.de/tests/12710507
And then also with --export-command
to spread over workers:
name=poo138698-okurz; openqa-clone-job --export-command --skip-chained-deps --skip-deps --parental-inheritance --within-instance https://openqa.suse.de/tests/12691358 EXCLUDE_MODULES=ha/barrier_init,support_server/wait_children TEST+=-$name BUILD=$name _GROUP=0 WORKER_CLASS=qemu_x86_64,tap,worker32
and then from that
openqa-cli api --host … 'WORKER_CLASS:12691358=qemu_x86_64,tap,worker32' 'WORKER_CLASS:12691361=qemu_x86_64,tap,worker33' 'WORKER_CLASS:12691363=qemu_x86_64,tap,worker34' …
-> https://openqa.suse.de/tests/12710512
EXCLUDE_MODULES seems to have no effect so doing
openqa-cli api --host … BUILD=poo138698-okurz-w32-33-34-2 … EXCLUDE_MODULES:12691358=barrier_init,wait_children … 'WORKER_CLASS:12691358=qemu_x86_64,tap,worker32' 'WORKER_CLASS:12691361=qemu_x86_64,tap,worker33' 'WORKER_CLASS:12691363=qemu_x86_64,tap,worker34'
-> https://openqa.suse.de/tests/12710515 passed so at least no easily reproducible problem.
for i in {001..040}; do openqa-cli api --host … BUILD=poo138698-okurz-w32-33-34-2 … EXCLUDE_MODULES:12691358=barrier_init,wait_children … 'TEST:12691358=qam_ha_priorityfencing_supportserver-poo138698-okurz-$i' 'TEST:12691361=qam_ha_priority_fencing_node01-poo138698-okurz-$i' 'TEST:12691363=qam_ha_priority_fencing_node02-poo138698-okurz-$i' … 'WORKER_CLASS:12691358=qemu_x86_64,tap_poo138707,worker32' 'WORKER_CLASS:12691361=qemu_x86_64,tap,worker33' 'WORKER_CLASS:12691363=qemu_x86_64,tap,worker34'
-> https://openqa.suse.de/tests/12710520 -> https://openqa.suse.de/tests/overview?build=poo138698-okurz-w32-33-34-2
Updated by okurz 12 months ago
- Related to action #138707: Re-enable worker32 for multi-machine tests in production added
Updated by acarvajal 11 months ago
Hello,
Just checked through Aggregate SAP & HA tests that ran over the weekend and things look much improved (meaning, failures are not related to this ticket). I still need to take a look at Incident jobs (which are more) and to the ticket activity from late Friday and the weekend, but wanted to drop a line regarding results.
Updated by acarvajal 11 months ago
Update from today. Issues seem to be present again. :(
- Found 3 failures in Aggregated jobs and over 10 in Incidents where SUT had been rebooted. For example https://openqa.suse.de/tests/12725464#step/check_after_reboot/13 (ran in worker33, Support Server in worker39)
- Found https://openqa.suse.de/tests/12731342#step/ha_cluster_init/7 in worker33 where connection from SUT to Support Server (running in worker29) failed.
- Found https://openqa.suse.de/tests/12731350#step/iscsi_client/15 in worker33 where connection from SUT to Support Server (running in worker35) failed.
- HAWK client in worker39 https://openqa.suse.de/tests/12725522#step/hawk_gui/29 fails to connect to node 1 running in worker34
- Cluster node running in worker40 https://openqa.suse.de/tests/12725512#step/cluster_md/32 fails to connect to other node in the cluster (running in worker33)
- HAWK client in worker37 https://openqa.suse.de/tests/12725591#step/hawk_gui/36 cannot connect to node2 in worker33 but it can connect to node1 in worker39
- Cluster node fails to reach qnetd server at 10.0.2.17 https://openqa.suse.de/tests/12724650#step/qnetd/26. Node runs in worker29 and qnetd server in worker34.
- Cluster init fails because qnetd server is unreachable https://openqa.suse.de/tests/12724701#step/ha_cluster_init/15. Node 1 in worker38, qnetd server in worker33.
- SUT cannot resolve names using the DNS provided by the Support Server: https://openqa.suse.de/tests/12724666#step/cluster_md/3. SUT in worker29, SS in worker33.
- HAWK client in worker29 fails to connect to cluster node also in worker29 https://openqa.suse.de/tests/12726431#step/hawk_gui/29, but error is name resolution and Support Server is in worker33.
- Node in worker34 cannot join cluster https://openqa.suse.de/tests/12731427#step/remove_node/11. Node 1 in worker35
As last Friday, most of the issues seem to be either in worker33 or worker34, so I have a strong suspicion that something is broken there.
These are the type of erros I found on HA job groups. Will check next SAP job groups but since those only have 1 cluster, I don't expect to find anything different. Will add it here if I do.
Updated by livdywan 11 months ago
- Description updated (diff)
- Assignee changed from okurz to livdywan
- Priority changed from Normal to Urgent
Checking with @acarvajal to see if we can spot a commonality here. Seemingly jobs involving any of those workers can fail, and we don't have a reliable reproducer.
there are no more x86_64 qemu tests scheduled right now on OSD I re-enabled all worker instances
Updating the rollback steps accordingly. Also editing the desc since it's misleading to include w31 or w32 here neither of which can currently run mm jobs.
Updated by livdywan 11 months ago
Let's start with taking out w33 and w34 anyway https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/666
Updated by livdywan 11 months ago
- Subject changed from significant increase in multi-machine test failures on OSD since 2023-10-25, e.g. test fails in support_server/setup to significant increase in multi-machine test failures on OSD since 2023-10-25, e.g. test fails in support_server/setup size:M
- Description updated (diff)
Updated by livdywan 11 months ago
livdywan wrote in #note-16:
Let's start with taking out w33 and w34 anyway https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/666
It would seem as though things are looking good again, and it might be one of the two machines. Pondering bringing back one of the workers to confirm.
Updated by acarvajal 11 months ago
livdywan wrote in #note-18:
livdywan wrote in #note-16:
Let's start with taking out w33 and w34 anyway https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/666
It would seem as though things are looking good again, and it might be one of the two machines.
Looking at the previous 2 days of Aggregate jobs and currently blocked MU results:
- No SAP jobs failed on Multi-Machine-related issues. SAP failures where either NFS-related (see https://progress.opensuse.org/issues/135980) or on IPMI backends.
- 2 HA jobs failed, but I don't have enough information to tie those failures to this MM issue. (see https://openqa.suse.de/tests/12738521#step/check_after_reboot/16 & https://openqa.suse.de/tests/12738710#step/register_without_ltss/70). First one could be related as it could be that the node was restarted, but 2nd failure is not related to this for sure, as it seems to be a repository issue.
Pondering bringing back one of the workers to confirm.
That would be fine from our end. Let us know if this is done to be on lookout for failures.
Updated by livdywan 11 months ago
I decided to simply spawn a batch of qam_ha_priorityfencing_supportserver. Let's see what the failure rate of that will be. We need to confirm wether this is one or multiple test issues, an issue with the worker setup or elsewhere in the infrastructure.
Updated by acarvajal 11 months ago
livdywan wrote in #note-22:
I decided to simply spawn a batch of qam_ha_priorityfencing_supportserver. Let's see what the failure rate of that will be. We need to confirm wether this is one or multiple test issues, an issue with the worker setup or elsewhere in the infrastructure.
Wow, this is cool! There are 5 failures so far ... all of them with at least one job in worker34. Good idea with using Priority Fencing test as it should run in ca. 30-40 minutes.
Could this be used as a reproducer, or is it still too big a test for such a purpose?
Updated by acarvajal 11 months ago
With worker34 enabled, failures have returned. Only thing in common of all these failures is that at least one of the jobs ran in worker34:
HA jobs:
https://openqa.suse.de/tests/12755438#step/vg/100
https://openqa.suse.de/tests/12755437#step/filesystem#1/30
https://openqa.suse.de/tests/12755433#step/check_after_reboot/42
https://openqa.suse.de/tests/12755466#step/check_after_reboot/2
https://openqa.suse.de/tests/12755491#step/filesystem#1/32
https://openqa.suse.de/tests/12755531#step/priority_fencing_delay/7
https://openqa.suse.de/tests/12755355#step/hawk_gui/110
https://openqa.suse.de/tests/12755354#step/filesystem/14
https://openqa.suse.de/tests/12755377#step/check_cluster_integrity/6
https://openqa.suse.de/tests/12755414#step/qnetd/18
SAP:
https://openqa.suse.de/tests/12755676#step/ha_cluster_init/74
Edit the morning after: saw 2 more failures in HA jobs in Incidents job groups with the same patterns as the jobs listed above.
All of these were in Single Incidents job groups. Aggregate jobs are still running at the moment.
Updated by livdywan 11 months ago
- Copied to action #139055: Comments mentioning bugrefs as part of a sentence are treated like bug refs and taken over size:S added
Updated by livdywan 11 months ago
- Status changed from In Progress to Feedback
- Priority changed from Urgent to High
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/668 to take out worker34 again.
My assumption is this ticket will be closed if the "significant increase" has been addressed. That's what this ticket is about.
Updated by livdywan 11 months ago
- Related to action #139070: Re-enable worker34 for multi-machine tests in production added
Updated by acarvajal 11 months ago
- Due date changed from 2023-11-17 to 2023-11-11
- Priority changed from High to Urgent
livdywan wrote in #note-29:
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/669 to confirm that worker 33 is indeed fine.
Seems worker33 has its impact: Aggregate jobs over the weekend were fine, but there were many failures in Single Incidents.
In HA job groups:
- https://openqa.suse.de/tests/12764299#step/ha_cluster_init/3 (support server in w33)
- https://openqa.suse.de/tests/12764304#step/hawk_gui/35 (node 2 in w33)
- https://openqa.suse.de/tests/12763512#step/check_cluster_integrity/6 (node 1 in w33)
- https://openqa.suse.de/tests/12763618#step/check_after_reboot/9 (support server in w33)
- https://openqa.suse.de/tests/12764321#step/filesystem/22 (everything but support server in w33)
- https://openqa.suse.de/tests/12764371#step/check_after_reboot/40 (SUT in w33)
- https://openqa.suse.de/tests/12764442#step/vg/100 (node 2 in w33)
- https://openqa.suse.de/tests/12764447#step/iscsi_client/15 (SUT in w33)
- https://openqa.suse.de/tests/12764451#step/iscsi_client/15 (support server in w33)
- https://openqa.suse.de/tests/12764449#step/register_without_ltss/13 (all nodes in w33)
- https://openqa.suse.de/tests/12764465#step/iscsi_client/20 (nodes and support server in w33)
- https://openqa.suse.de/tests/12763656#step/check_cluster_integrity/6 (node 1 in w33)
- https://openqa.suse.de/tests/12763645#step/hawk_gui/252 (node 2 in w33)
- https://openqa.suse.de/tests/12764400#step/check_cluster_integrity/6 (node 2 in w33)
- https://openqa.suse.de/tests/12764410#step/hawk_gui/6 (client in w33)
In SAP job groups:
- https://openqa.suse.de/tests/12764592#step/setup/58 (support server in w33)
- https://openqa.suse.de/tests/12764600#step/setup/70 (support server in w33)
While not totally thorough, I'd say this looks similar to what I saw when worker34 was enabled in #note-24 and will maintain my hypothesis that there's something wrong with these 2 workers.
Updated by livdywan 11 months ago
- Related to action #139154: Re-enable worker33 for multi-machine tests in production added
Updated by livdywan 11 months ago
- Priority changed from Urgent to High
@acarvajal Thank you for the update even during hack week. Let's see if what remains looks stable enough so we can start to focus on narrowing down the specific problems.
Updated by JERiveraMoya 11 months ago
Looks like this ticket is labeling unrelated things, right? Please, see these two examples for Beta candidate:
https://openqa.suse.de/tests/12774019#comments
https://openqa.suse.de/tests/12775212#comments
Updated by livdywan 11 months ago
JERiveraMoya wrote in #note-36:
Looks like this ticket is labeling unrelated things, right? Please, see these two examples for Beta candidate:
https://openqa.suse.de/tests/12774019#comments
https://openqa.suse.de/tests/12775212#comments
Those are linked to bsc#1217056 and bsc#1191684 respectively. Do you mean they were linked to this ticket?
There was some jobs that were accidentally linked by way of using the wrong format with Re-running to verify connection with poo#138698
. If that's what you saw please accept my apologies. This was only meant to be an informational comment.
Updated by okurz 11 months ago
- Due date deleted (
2023-11-11) - Status changed from Feedback to Resolved
Looking at https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=now-7d&to=now it seems we are good regarding the ratio of failed multi-machine test issues. Also https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-HA-Incidents&machine=64bit&test=qam_ha_priorityfencing_supportserver&version=15-SP5#next_previous is very reliable for AC1 and AC2 is covered with #138707, #139070, #139154
Normally I would check for this ticket being used as ticket label on o3+osd but as o3 is currently not reachable due to https://progress.opensuse.org/issues/150815 openqa-query-for-job-label
wouldn't work I don't bother and call ticket ticket resolved.
Updated by JERiveraMoya 11 months ago
Now that this ticket is resolved, what is the ticket to use for https://openqa.suse.de/tests/12799702, wasn't the same issue?
this problem seems to be a zypper issue connecting osd in MM.
Updated by okurz 11 months ago
- Status changed from Resolved to Workable
- Assignee deleted (
livdywan)
ok, https://openqa.suse.de/tests/12799702 looks like the same original problem, reopening.
Updated by bschmidt 11 months ago
I've talked to Paolo Stivanin and we both agreed that https://openqa.suse.de/tests/12799702 does not look like a multi machine issue.
Therefore I've unlinked the poo from that job now.
Updated by livdywan 11 months ago
- Status changed from Workable to Feedback
- Assignee set to livdywan
bschmidt wrote in #note-41:
I've talked to Paolo Stivanin and we both agreed that https://openqa.suse.de/tests/12799702 does not look like a multi machine issue.
Therefore I've unlinked the poo from that job now.
Thanks!
I'm taking the ticket again, and will monitor for a bit. We have notably #136013 with regard to more general underlying issues, and #139070, #138707, #139154 and #136130 respectively concerning particular workers. Remember this ticket is not about one individual issue but rather the symptoms.
Updated by okurz 11 months ago
- Due date set to 2023-11-24
ok, discussed in tools team meeting. livdywan you mentioned another ticket about the specific issue apparent in the job failure of https://openqa.suse.de/tests/12799702#comments but there is only a link back to this ticket here. Please find the according reference.
Updated by livdywan 11 months ago
- Status changed from Feedback to Resolved
okurz wrote in #note-43:
ok, discussed in tools team meeting. livdywan you mentioned another ticket about the specific issue apparent in the job failure of https://openqa.suse.de/tests/12799702#comments but there is only a link back to this ticket here. Please find the according reference.
Fixed. Issue #150932 wasn't linked on all of the jobs.
Updated by livdywan 11 months ago
- Related to action #150932: [security][SP6] Failed to connect to openqa.suse.de port 80 in krb5_crypt_nfs_server added
Updated by livdywan 11 months ago
- Related to deleted (action #150932: [security][SP6] Failed to connect to openqa.suse.de port 80 in krb5_crypt_nfs_server)
Updated by openqa_review 10 months ago
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: extra_tests_webserver
https://openqa.suse.de/tests/12837409#step/php_version/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by okurz 10 months ago
- Related to action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M added