action #159558
closednetwork unreachable on aarch64-o3
0%
Description
Observation¶
openQA test in scenario microos-Tumbleweed-MicroOS-Image-ContainerHost-aarch64-container-host2microosnext@aarch64 fails in
zypper_ref
network is unreachable on aarch64-o3
Test suite description¶
Boot from the latest published MicroOS ContainerHost image and transactional-update dup to snapshot under test. Make sure to use %BUILD% in the URL and file name to force a redownload for new builds.
Reproducible¶
Fails since (at least) Build 20240421
Expected result¶
Last good: 20240418 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz 7 months ago
- Related to action #150869: Ensure multi-machine tests work on aarch64-o3 (or another but single machine only) size:M added
Updated by mkittler 7 months ago ยท Edited
Looks like this can be reproduced even outside of a VM via e.g. curl 'http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot20240423/repodata/repomd.xml'
. But e.g. curl 'http://www.google.de'
works so probably specific to reaching o3.
It works with HTTPs, though. Not sure why only HTTP ceased to work running the MM setup script.
Not sure how this worked before. For now I disabled the worker slots on aarch64-o3 by assigning a different worker class in /etc/openqa/workers.ini
.
I'm wondering whether this scenario has ever worked on aarch64-o3 since it was moved to the FC basement (because I couldn't find a passing job in that scenario that actually ran on aarch64-o3). And yes, the migration to nginx probably made things worse as well.
And considering jobs like https://openqa.opensuse.org/tests/4103335#step/gnome_window_switcher/9 the network on aarch64-o3 is definitely not completely broken (also not inside VMs).
The output of this network connectivity check looks also good - and the test only fails later trying to access o3: https://openqa.opensuse.org/tests/4104487#step/hostname/25
So I think the problem is not that the entire network is unreachable on aarch64-o3 but only that http traffic to o3 doesn't work.
But it might still be related to the MM setup. The most recent job that still worked (and relied on repo refreshing) is from 5 days ago (before my changes): https://openqa.opensuse.org/tests/4094290#step/zypper_ref/18
It looks like there are jobs that ran after the MM setup but that could refresh repositories just fine, e.g. https://openqa.opensuse.org/tests/4102678#step/zypper_ref/15 (and it really used o3 and plain http, see https://openqa.opensuse.org/tests/4102678#step/zypper_ar/6).
Note that I also cannot reach o3 via http from my laptop (in VPN) or from backup-qam.qe.nue2.suse.org (which is in the neighboring rack). Only https works. I could reach o3 via http only from workers within the o3 network (e.g. arm1).
Looks like the NGINX config was modified around the time issues starting to come up:
Apr 24 09:33 /etc/nginx/vhosts.d/openqa.conf
Considering okurz pts/0 2a07:de40:b2bf:2 Wed Apr 24 09:33 still logged in
it was maybe okurz
who changed it. EDIT: It was this change: #133358#note-14
Updated by okurz 7 months ago
- Related to action #133358: Migration of o3 VM to PRG2 - Ensure IPv6 is fully working added
Updated by mkittler 7 months ago
So probably the firewall (https://sd.suse.com/servicedesk/customer/portal/1/SD-128488) and not the MM setup. I keep the worker slots disabled anyway.
Updated by mkittler 7 months ago
Production jobs seem to work again, e.g. https://openqa.opensuse.org/tests/4106603#step/selinux_smoke/7. So I'm considering this resolved.