action #181991
closeddiesel.qe.nue2.suse.org does not respond over salt "Minion did not return. [Not connected]"
0%
Description
Observation¶
On OSD sudo salt \* test.ping
returns
diesel.qe.nue2.suse.org:
Minion did not return. [Not connected]
Rollback steps¶
salt-key -y -a diesel.qe.nue2.suse.org
Acceptance criteria¶
- AC1: diesel.qe.nue2.suse.org is responsive to salt
Suggestions¶
- Try to reach diesel.qe.nue2.suse.org over ssh manually and look into /var/log/salt/minion
- DONE: Verify that diesel.qe.nue2.suse.org is responsive on ssh
- Try to ping salt nodes i.e.
retry ssh openqa.suse.de 'sudo salt \* test.ping'
Updated by gpathak 7 days ago
- Copied from action #181853: worker35.oqa.prg2.suse.org does not respond over salt "Minion did not return. [Not connected]" size:S added
Updated by nicksinger 7 days ago
- Description updated (diff)
machine removed from salt to mitigate urgency, see rollback steps to enable again.
Updated by nicksinger 6 days ago
- Status changed from New to In Progress
- Assignee set to nicksinger
@okurz mentioned broken wireguard tunnels in https://suse.slack.com/archives/C02AJ1E568M/p1746703936377089 and I checked diesel - indeed wireguard was running on there while we expect it to not run there by now (see previous tickets about MTU problems).
A quick test with disabling the wg-quick@prg2wg.service
showed that most of our problems went away (websocket connection to OSD working again, salt working again, highstate applying correctly). I'm not sure if someone or something enabled these tunnels again. I just made very sure they are disabled again now:
systemctl mask wg-quick@prg2wg.service
systemctl mask prg2wg-restart.service
systemctl mask prg2wg-config.path
systemctl disable wg-quick@prg2wg.service
systemctl disable prg2wg-restart.service
systemctl disable prg2wg-config.path
Updated by nicksinger 6 days ago
- Status changed from In Progress to Feedback
machine looks stable so far - I will check tomorrow again if maybe something over night enables these services again.
Updated by nicksinger 3 days ago
- Status changed from Feedback to Resolved
machine stable over the weekend