Project

General

Profile

Actions

action #162734

open

openQA Project - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

coordination #161735: [epic] Better error detection on GRE tunnel misconfiguration

Simple script detecting gre_tunnel_preup.sh with only empty remote_ip= statements during salt CI pipelines size:M

Added by okurz 26 days ago. Updated 8 days ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2024-06-21
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

See #161735-4 where nicksinger explains how the salt mine seems to be empty sometimes causing to end up with /etc/wicked/scripts/gre_tunnel_preup.sh being "empty" again (only containing options:remote_ip=, e.g. worker36 (offline at point of file generation) lines).

Acceptance criteria

  • AC1: gre_tunnel_preup.sh scripts are ensured to have at least one valid remote_ip= statement
  • AC2: All remote_ip= statements represent relevant peers, e.g. current online TAP worker hosts of same architecture

Suggestions

  • Look into https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/openvswitch.sls#L122
  • Start with sudo salt --no-color -C 'G@roles:worker' cmd.run 'test -e /etc/wicked/scripts/gre_tunnel_preup.sh && grep remote_ip /etc/wicked/scripts/gre_tunnel_preup.sh'
  • The task can be solved by ensuring non-empty entries during generation or also retroactively as part of the CI pipeline execution in a post-deploy monitoring step: Something like find currently online salt connected workers, use that as filter against https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls in a separate script
  • Consider the case of an island cluster where actually no peers are expected
  • Let this new script make a diff between the old and new version of gre_tunnel_preup.sh and do a sanity check on the diff (e.g. if too many lines have been removed reject the change)
  • Check if nicksinger already disabled the grain-cache and if that helped

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #162455: Secondary TAP worker class instead of "tap_poo…" on closed tickets size:SResolvedokurz2024-06-18

Actions
Actions

Also available in: Atom PDF