action #181862
openMTU connection issues on osiris-1 virtual machines
0%
Description
Observation¶
I have noticed strange connection issues when trying to setup a VM on osiris-1.qe.nue2.suse.org (NUE-2 machine). Although the initial setup of a fresh leap image, including network connectivity, seems to be working fine (i can successfully ping and curl download.opensuse.org from within the VM), once i try to run any zypper command the connection will get stuck, without any output indicating what happened.
I was suggested to run ping commands with a higher MTU value as @okurz assumed these issues may be related to that, which seems to be correct, as those ping commands would also get stuck.
It seems only VMs on osiris-1 are affected.
Similar network-related MTU issues have been reported in previous SD tickets:
Steps to reproduce¶
-
Create a fresh Leap 15.6 or Tumbleweed VM on
osiris-1
. -
Confirm that basic network connectivity works (
ping
,curl
todownload.opensuse.org
). -
Run
zypper ref
and observe that it hangs without completing. -
Alternatively, run:
ping -Mdo -s1442 download.opensuse.org
and observe that it also hangs.
Suggestions¶
-
DONE It only this one VM affected ?
Try another VM on osiris
-> ok1 also affected -
DONE Is the host itself affected ?
Try on osiris-1 directly
-> not affected -
DONE Are only VMs on osiris-1 affected ?
Try another VM on another host(e.g. ada.qe.suse.de)
-> not affected - Consider introducing a network diagnostic hook or health check script for VM post-boot validation.
CLICK HERE To see the entire list of hypothesis / experiments and observations
-
REJECTED H1 All NUE2 QE machines have problems with MTU sizes
-
E1-1 Select any other than the original machine and call
zypper ref
and observe if this times out- O1-1-1 osiris itself has no problem
-
E1-1 Select any other than the original machine and call
-
REJECTED H1.1* All NUE2 QE non-salt controlled machines have problems with MTU sizes
- -> see O1.2-1
-
ACCEPTED H1.2 Only VMs on osiris have problems
-
E1.2-1 Try a VM elsewhere e.g. qamaster
- O1.2-1 VM on ada.qe.suse.de not affected
-
E1.2-2 Try another VM on osiris
- O1.2-2 ok1 also affected
-
E1.2-1 Try a VM elsewhere e.g. qamaster
-
REJECTED H1.3 Only rrichardson VM has problems
- E1.3-1 See O1.2-2
-
ACCEPTED H2 The problem of
zypper ref
can be more easily reproduced withping -Mdo -s1442 download.opensuse.org
-
E2-1 Try the ping and if it fails then we can assume this is a valid
reproducer until that is fixed. Then verify withzypper ref
again-
O2-1-1 confirmed reproducing an error so assumed to be valid
reproduced
-
O2-1-1 confirmed reproducing an error so assumed to be valid
-
E2-1 Try the ping and if it fails then we can assume this is a valid
-
ACCEPTED H3 The MTU size problem only appeared recently
-
E3-1 Check logs
- O3-1-1 From ok1.qe.nue2.suse.org also running on osiris-1 okurz found that the automatic os-update stopped after 2025-04-12 showing timeouts in /var log/zypper.log since 2025-04-12. So "last good" 2025-04-12
-
E3-1 Check logs
-
REJECTED H4 The problem started with recent Tumbleweed 20250410 which is the last upgraded version on ok1
-
E4-1 Try to recreate the problem on a different version
- O4-1 Leap 15.6 was also shown to be affected
-
E4-1 Try to recreate the problem on a different version
Workaround¶
Manually set the MTU size within the affected VM to a lower value, like 1360:
ip link set dev eth0 mtu 1360
This allows zypper
and other network operations to proceed without hanging.