Project

General

Profile

Actions

action #181991

closed

diesel.qe.nue2.suse.org does not respond over salt "Minion did not return. [Not connected]"

Added by gpathak 7 days ago. Updated 3 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2025-05-06
Due date:
% Done:

0%

Estimated time:

Description

Observation

On OSD sudo salt \* test.ping returns

diesel.qe.nue2.suse.org:
    Minion did not return. [Not connected]

Rollback steps

  • salt-key -y -a diesel.qe.nue2.suse.org

Acceptance criteria

  • AC1: diesel.qe.nue2.suse.org is responsive to salt

Suggestions

  • Try to reach diesel.qe.nue2.suse.org over ssh manually and look into /var/log/salt/minion
  • DONE: Verify that diesel.qe.nue2.suse.org is responsive on ssh
  • Try to ping salt nodes i.e. retry ssh openqa.suse.de 'sudo salt \* test.ping'

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #181853: worker35.oqa.prg2.suse.org does not respond over salt "Minion did not return. [Not connected]" size:SResolveddheidler2025-05-06

Actions
Actions #1

Updated by gpathak 7 days ago

  • Copied from action #181853: worker35.oqa.prg2.suse.org does not respond over salt "Minion did not return. [Not connected]" size:S added
Actions #2

Updated by nicksinger 7 days ago

  • Description updated (diff)

machine removed from salt to mitigate urgency, see rollback steps to enable again.

Actions #3

Updated by nicksinger 6 days ago

  • Status changed from New to In Progress
  • Assignee set to nicksinger

@okurz mentioned broken wireguard tunnels in https://suse.slack.com/archives/C02AJ1E568M/p1746703936377089 and I checked diesel - indeed wireguard was running on there while we expect it to not run there by now (see previous tickets about MTU problems).
A quick test with disabling the wg-quick@prg2wg.service showed that most of our problems went away (websocket connection to OSD working again, salt working again, highstate applying correctly). I'm not sure if someone or something enabled these tunnels again. I just made very sure they are disabled again now:

systemctl mask wg-quick@prg2wg.service
systemctl mask prg2wg-restart.service
systemctl mask prg2wg-config.path
systemctl disable wg-quick@prg2wg.service
systemctl disable prg2wg-restart.service
systemctl disable prg2wg-config.path
Actions #4

Updated by nicksinger 6 days ago

  • Status changed from In Progress to Feedback

machine looks stable so far - I will check tomorrow again if maybe something over night enables these services again.

Actions #5

Updated by nicksinger 3 days ago

  • Status changed from Feedback to Resolved

machine stable over the weekend

Actions

Also available in: Atom PDF