action #160652
closedopenQA Project (public) - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens
openQA Project (public) - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers
Secondary TAP worker class in different zones size:S
0%
Description
Motivation¶
We had qesapworkers disabled due to MM-test issues.
As part of #152389 "tap" was disabled from qesapworkers. What changed since then is that #152389 was erroneously resolved without completing the rollback steps mentioned in #152389#Rollback-steps while still keeping the tap classes disabled referencing then closed #152389. It's worth to re-enable those machines (multiple, not one) to have "tap" to have a simpler and consistent configuration where multi-machine support is enabled by default and not an exception. Also since then we better handle islands of multi-machine enabled workers depending on region/datacenter/location so can we use that? No, we can't because we also would need to teach the openQA scheduler to only schedule within one zone.
That's more related to #160646 about other machines, fixed meanwhile. This ticket here was only about qesapworkers in particular while there are multiple other machines providing "tap" already
Acceptance criteria¶
- AC1: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls has no mention of "tap_poo152389" anymore
- AC2: We can still explicitly schedule multi-machine tests on those workers
Suggestions¶
- As the openQA scheduler currently would schedule covering multiple locations which would lead to problems we should not add the tap class to machines in PRG1. Similarly for NUE2 based machines we removed the "tap" class completely. So we should simply remove the tap_poo152389 worker class here as well. Or use "tap_secondary" and explain "tap_secondary" in the README where we describe worker classes
Updated by okurz 7 months ago
- Copied from action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M added
Updated by okurz 7 months ago
livdywan wrote in #note-2:
Can you please add some context? What changed since? Why is it worth re-enabling the machine?
Those questions surprise me a bit but let's try to be more explicit then. As part of #152389 "tap" was disabled from qesapworkers. What changed since then is that #152389 was erroneously resolved without completing the rollback steps mentioned in #152389#Rollback-steps while still keeping the tap classes disabled referencing then closed #152389. It's worth to re-enable those machines (multiple, not one) to have "tap" to have a simpler and consistent configuration where multi-machine support is enabled by default and not an exception.
dzedro wrote in #note-3:
There is no tap worker available now ? MM jobs are waiting.
That's more related to #160646 about other machines, fixed meanwhile. This ticket here was only about qesapworkers in particular while there are multiple other machines providing "tap" already
Updated by ybonatakis 6 months ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by ybonatakis 6 months ago
Updated by ybonatakis 6 months ago
- Status changed from In Progress to Feedback
Cant find a MM job https://openqa.suse.de/admin/workers/2409 or https://openqa.suse.de/admin/workers/2448 to clone
Updated by okurz 6 months ago
- Copied to action #162374: Limit number of OSD PRG2 x86_64 tap multi-machine workers until stabilized added
Updated by okurz 6 months ago
- Copied to action #162455: Secondary TAP worker class instead of "tap_poo…" on closed tickets size:S added
Updated by okurz 6 months ago
- Status changed from Feedback to Resolved
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/842 merged. Good enough.