Project

General

Profile

Actions

action #160652

closed

openQA Project - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

openQA Project - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

Secondary TAP worker class in different zones size:S

Added by okurz 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

We had qesapworkers disabled due to MM-test issues.
As part of #152389 "tap" was disabled from qesapworkers. What changed since then is that #152389 was erroneously resolved without completing the rollback steps mentioned in #152389#Rollback-steps while still keeping the tap classes disabled referencing then closed #152389. It's worth to re-enable those machines (multiple, not one) to have "tap" to have a simpler and consistent configuration where multi-machine support is enabled by default and not an exception. Also since then we better handle islands of multi-machine enabled workers depending on region/datacenter/location so can we use that? No, we can't because we also would need to teach the openQA scheduler to only schedule within one zone.

That's more related to #160646 about other machines, fixed meanwhile. This ticket here was only about qesapworkers in particular while there are multiple other machines providing "tap" already

Acceptance criteria

Suggestions

  • As the openQA scheduler currently would schedule covering multiple locations which would lead to problems we should not add the tap class to machines in PRG1. Similarly for NUE2 based machines we removed the "tap" class completely. So we should simply remove the tap_poo152389 worker class here as well. Or use "tap_secondary" and explain "tap_secondary" in the README where we describe worker classes

Related issues 3 (0 open3 closed)

Copied from openQA Project - action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:MResolvedmkittler2023-12-11

Actions
Copied to openQA Infrastructure - action #162374: Limit number of OSD PRG2 x86_64 tap multi-machine workers until stabilizedResolvedokurz2024-06-17

Actions
Copied to openQA Infrastructure - action #162455: Secondary TAP worker class instead of "tap_poo…" on closed tickets size:SResolvedokurz2024-06-18

Actions
Actions #1

Updated by okurz 6 months ago

  • Copied from action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M added
Actions #2

Updated by livdywan 6 months ago

Can you please add some context? What changed since? Why is it worth re-enabling the machine?

Actions #3

Updated by dzedro 6 months ago

There is no tap worker available now ? MM jobs are waiting.

Actions #4

Updated by okurz 6 months ago

livdywan wrote in #note-2:

Can you please add some context? What changed since? Why is it worth re-enabling the machine?

Those questions surprise me a bit but let's try to be more explicit then. As part of #152389 "tap" was disabled from qesapworkers. What changed since then is that #152389 was erroneously resolved without completing the rollback steps mentioned in #152389#Rollback-steps while still keeping the tap classes disabled referencing then closed #152389. It's worth to re-enable those machines (multiple, not one) to have "tap" to have a simpler and consistent configuration where multi-machine support is enabled by default and not an exception.

dzedro wrote in #note-3:

There is no tap worker available now ? MM jobs are waiting.

That's more related to #160646 about other machines, fixed meanwhile. This ticket here was only about qesapworkers in particular while there are multiple other machines providing "tap" already

Actions #5

Updated by okurz 6 months ago

  • Subject changed from significant increase in MM-test failure ratio 2023-12-11 - enable qesapworker with "tap" again to Secondary TAP worker class in different zones
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by okurz 6 months ago

  • Subject changed from Secondary TAP worker class in different zones to Secondary TAP worker class in different zones size:S
Actions #7

Updated by ybonatakis 5 months ago

  • Status changed from Workable to In Progress
  • Assignee set to ybonatakis
Actions #9

Updated by ybonatakis 5 months ago

  • Status changed from In Progress to Feedback
Actions #10

Updated by okurz 5 months ago

  • Copied to action #162374: Limit number of OSD PRG2 x86_64 tap multi-machine workers until stabilized added
Actions #11

Updated by okurz 5 months ago

  • Copied to action #162455: Secondary TAP worker class instead of "tap_poo…" on closed tickets size:S added
Actions #12

Updated by okurz 5 months ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF