action #157606: Prevent missing gre tunnel connections in our salt states due to misconfiguration - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

action #157606

open

openQA Project (public) - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

openQA Project (public) - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

Prevent missing gre tunnel connections in our salt states due to misconfiguration

Added by okurz about 1 year ago. Updated 2 days ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Feature requests

Target version:

QA (public) - future

Start date:

2024-03-19

Due date:

% Done:

Estimated time:

Tags:

infra

Description

Motivation¶

In #157534 we encountered the case of multi-machine tests failing due to a worker with "tap" class ending up with no GRE tunnel connections to other hosts that participated in cluster tests. This was due to me doing a mistake and using a differing "location-" worker class which is fixed meanwhile but our salt states worker class gre tunnel thingy computation in https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/_modules/gre_peers.py?ref_type=heads was happily putting worker40 in one "cluster" which we should improve to better handle.

Acceptance criteria¶

AC1: /etc/wicked/scripts/gre_tunnel_preup.sh on OSD workers is ensured to have N:N connections for all "tap" connected workers

Suggestions¶

Provide a summary when generating the files i.e. not relying on people to check files by hand
Issue errors or warnings in cases like cluster with only 1 machine in it
Take a look at worker29+30+31+32 based on https://netbox.suse.de/dcim/devices/6156/device-bays/ as they are all same, in one chassis and our workerconf in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls lists all four with worker classes "tap" so they should all be inter-connected but at time of writing in w29 there is only a connection to w32+w36, not any other
Extend the unit tests and investigate how to improve them
- https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/52bdbd8ab4537db362a55ecd93f5bc97be171bf9
- Check how salt is configured, maybe we are relying on old data that was not syced yet?

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Updated by okurz about 1 year ago

Copied from action #157534: Multi-Machine Job fails in suseconnect_scc due to worker class misconfiguration when we introduced prg2e machines added

Actions

Copy link

Updated by okurz 10 months ago

Target version changed from future to Ready

Actions

Copy link

Updated by okurz 10 months ago

Related to action #162320: multi-machine test failures 2024-06-14+, auto_review:"ping with packet size 100 failed.*can be GRE tunnel setup issue":retry added

Actions

Copy link

Updated by okurz 9 months ago

Target version changed from Ready to Tools - Next

Actions

Copy link

Updated by okurz 8 months ago

Related to action #160826: Optimize gre_tunnel_preup.sh generation jinja template size:S added

Actions

Copy link

Updated by okurz 8 months ago

Related to action #162734: Simple script detecting gre_tunnel_preup.sh with only empty remote_ip= statements during salt CI pipelines size:M added

Actions

Copy link

Updated by okurz 8 months ago

Status changed from New to Blocked
Assignee set to okurz

#162734

Actions

Copy link

Updated by okurz 14 days ago

Status changed from Blocked to New
Assignee deleted (~~okurz~~)

Actions

Copy link

Updated by okurz 2 days ago

Description updated (diff)
Target version changed from Tools - Next to future

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #157606

Prevent missing gre tunnel connections in our salt states due to misconfiguration

Motivation¶

Acceptance criteria¶

Suggestions¶

Updated by okurz about 1 year ago

Updated by okurz 10 months ago

Updated by okurz 10 months ago

Updated by okurz 9 months ago

Updated by okurz 8 months ago

Updated by okurz 8 months ago

Updated by okurz 8 months ago

Updated by okurz 14 days ago

Updated by okurz 2 days ago