Project

General

Profile

Actions

action #153769

closed

coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

Better handle changes in GRE tunnel configuration size:M

Added by mkittler 3 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-01-17
Due date:
% Done:

0%

Estimated time:

Description

Motivation

When changing the GRE tunnel configuration (/etc/wicked/scripts/gre_tunnel_preup.sh) by changing related salt states or workerconf.sls in pillars these changes are not applied automatically unlike worker settings. This can lead to openQA test failures due to inconsistencies as well as potentially incomplete routing due to STP selections.

Acceptance criteria

  • AC1: We are able to change the GRE tunnel configuration on any salt-controlled openQA worker without causing openQA test failures

Suggestions

  • Run ovs-appctl stp/show like on all workers to see how it currently routes packages
  • In the best case our salt states handle this automatically. It would be possible to simply re-run /etc/wicked/scripts/gre_tunnel_preup.sh after it has changed.
    • Adding/removing ports will cause a temporary unavailability of the network and thus disrupt tests.
    • Stop the services, re-run the script and finally start the services again?
    • If necessary reboot the host (not sure how easy this is to trigger from salt states).
  • In the worst case we make sure the limitation is properly documented with instructions to follow (e.g. command to reboot all workers).
  • So simply try out to rerun /etc/wicked/scripts/gre_tunnel_preup.sh in salt after it has changed and monitor for bad consequences
  • Monitor https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&viewPanel=24
  • If nothing bad happened then assume we are done, else try to trigger reboots

Further details


Related issues 2 (0 open2 closed)

Related to openQA Project - action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:MResolvedmkittler2023-12-11

Actions
Related to openQA Project - action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.deResolvedmkittler2024-01-30

Actions
Actions #1

Updated by mkittler 3 months ago

  • Related to action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M added
Actions #2

Updated by okurz 3 months ago

  • Tags set to infra, salt, gre, multi-machine
  • Target version set to future
  • Parent task set to #111929
Actions #3

Updated by okurz 3 months ago

  • Target version changed from future to Ready
Actions #4

Updated by okurz 3 months ago

  • Subject changed from Better handle changes in GRE tunnel configuration to Better handle changes in GRE tunnel configuration size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by okurz 3 months ago

  • Related to action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de added
Actions #6

Updated by okurz 3 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz
Actions #7

Updated by okurz 3 months ago

  • Status changed from In Progress to Feedback
Actions #8

Updated by okurz 3 months ago

Now proposing in salt:

# Alternative to call 'systemctl restart network'
# If network reinitialization is not enough we could still go with
#system.reboot:
#  module.run:
wicked ifup all:
  cmd.run:
    - onchanges:
      - file: /etc/wicked/scripts/gre_tunnel_preup.sh
Actions #9

Updated by okurz 3 months ago

  • Priority changed from Normal to Low
Actions #10

Updated by okurz 3 months ago

  • Target version changed from Ready to Tools - Next
Actions #11

Updated by okurz about 2 months ago

  • Status changed from Feedback to Resolved
  • Target version changed from Tools - Next to Ready

no more problems observed in the past days. Maybe https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1105 helped but also #155929 is a likely candidate.

Actions

Also available in: Atom PDF