Project

General

Profile

Actions

action #181862

open

MTU connection issues on osiris-1 virtual machines

Added by robert.richardson 5 days ago. Updated 3 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

I have noticed strange connection issues when trying to setup a VM on osiris-1.qe.nue2.suse.org (NUE-2 machine). Although the initial setup of a fresh leap image, including network connectivity, seems to be working fine (i can successfully ping and curl download.opensuse.org from within the VM), once i try to run any zypper command the connection will get stuck, without any output indicating what happened.

I was suggested to run ping commands with a higher MTU value as @okurz assumed these issues may be related to that, which seems to be correct, as those ping commands would also get stuck.

It seems only VMs on osiris-1 are affected.

Similar network-related MTU issues have been reported in previous SD tickets:

Steps to reproduce

  1. Create a fresh Leap 15.6 or Tumbleweed VM on osiris-1.

  2. Confirm that basic network connectivity works (ping, curl to download.opensuse.org).

  3. Run zypper ref and observe that it hangs without completing.

  4. Alternatively, run:

    ping -Mdo -s1442 download.opensuse.org
    

    and observe that it also hangs.

Suggestions

  • DONE It only this one VM affected ? Try another VM on osiris
    -> ok1 also affected
  • DONE Is the host itself affected ? Try on osiris-1 directly
    -> not affected
  • DONE Are only VMs on osiris-1 affected ? Try another VM on another host (e.g. ada.qe.suse.de)
    -> not affected
  • Consider introducing a network diagnostic hook or health check script for VM post-boot validation.
CLICK HERE To see the entire list of hypothesis / experiments and observations
  • REJECTED H1 All NUE2 QE machines have problems with MTU sizes
    • E1-1 Select any other than the original machine and call zypper ref
      and observe if this times out
      • O1-1-1 osiris itself has no problem
  • REJECTED H1.1* All NUE2 QE non-salt controlled machines have problems with MTU sizes
    • -> see O1.2-1
  • ACCEPTED H1.2 Only VMs on osiris have problems
    • E1.2-1 Try a VM elsewhere e.g. qamaster
      • O1.2-1 VM on ada.qe.suse.de not affected
    • E1.2-2 Try another VM on osiris
      • O1.2-2 ok1 also affected
  • REJECTED H1.3 Only rrichardson VM has problems
    • E1.3-1 See O1.2-2
  • ACCEPTED H2 The problem of zypper ref can be more easily reproduced with ping -Mdo -s1442 download.opensuse.org
    • E2-1 Try the ping and if it fails then we can assume this is a valid
      reproducer until that is fixed. Then verify with zypper ref again
      • O2-1-1 confirmed reproducing an error so assumed to be valid
        reproduced
  • ACCEPTED H3 The MTU size problem only appeared recently
    • E3-1 Check logs
      • O3-1-1 From ok1.qe.nue2.suse.org also running on osiris-1 okurz found that the automatic os-update stopped after 2025-04-12 showing timeouts in /var log/zypper.log since 2025-04-12. So "last good" 2025-04-12
  • REJECTED H4 The problem started with recent Tumbleweed 20250410 which is the last upgraded version on ok1
    • E4-1 Try to recreate the problem on a different version
      • O4-1 Leap 15.6 was also shown to be affected

Workaround

Manually set the MTU size within the affected VM to a lower value, like 1360:

ip link set dev eth0 mtu 1360

This allows zypper and other network operations to proceed without hanging.

Actions

Also available in: Atom PDF