action #159558: network unreachable on aarch64-o3 - openQA Tests (public) - openSUSE Project Management Tool

Actions

Copy link

action #159558

closed

network unreachable on aarch64-o3

Added by ggardet_arm about 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

High

Assignee:

mkittler

Category:

Bugs in existing tests

Target version:

openQA Project (public) - Ready

Start date:

2024-04-24

Due date:

% Done:

Estimated time:

Difficulty:

Tags:

infra, reactive work

Description

Observation¶

openQA test in scenario microos-Tumbleweed-MicroOS-Image-ContainerHost-aarch64-container-host2microosnext@aarch64 fails in
zypper_ref

network is unreachable on aarch64-o3

Test suite description¶

Boot from the latest published MicroOS ContainerHost image and transactional-update dup to snapshot under test. Make sure to use %BUILD% in the URL and file name to force a redownload for new builds.

Reproducible¶

Fails since (at least) Build 20240421

Expected result¶

Last good: 20240418 (or more recent)

Further details¶

Always latest result in this scenario: latest

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by okurz about 1 year ago

Related to action #150869: Ensure multi-machine tests work on aarch64-o3 (or another but single machine only) size:M added

Actions

Copy link

Updated by okurz about 1 year ago

Tags set to infra, reactive work
Assignee set to mkittler
Priority changed from Normal to Urgent
Target version set to Ready

#150869

Actions

Copy link

Updated by mkittler about 1 year ago · Edited

Looks like this can be reproduced even outside of a VM via e.g. curl 'http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot20240423/repodata/repomd.xml'. But e.g. curl 'http://www.google.de' works so probably specific to reaching o3.

It works with HTTPs, though. Not sure why only HTTP ceased to work running the MM setup script.

Not sure how this worked before. For now I disabled the worker slots on aarch64-o3 by assigning a different worker class in /etc/openqa/workers.ini.

I'm wondering whether this scenario has ever worked on aarch64-o3 since it was moved to the FC basement (because I couldn't find a passing job in that scenario that actually ran on aarch64-o3). And yes, the migration to nginx probably made things worse as well.
And considering jobs like https://openqa.opensuse.org/tests/4103335#step/gnome_window_switcher/9 the network on aarch64-o3 is definitely not completely broken (also not inside VMs).
The output of this network connectivity check looks also good - and the test only fails later trying to access o3: https://openqa.opensuse.org/tests/4104487#step/hostname/25

So I think the problem is not that the entire network is unreachable on aarch64-o3 but only that http traffic to o3 doesn't work.

But it might still be related to the MM setup. The most recent job that still worked (and relied on repo refreshing) is from 5 days ago (before my changes): https://openqa.opensuse.org/tests/4094290#step/zypper_ref/18
It looks like there are jobs that ran after the MM setup but that could refresh repositories just fine, e.g. https://openqa.opensuse.org/tests/4102678#step/zypper_ref/15 (and it really used o3 and plain http, see https://openqa.opensuse.org/tests/4102678#step/zypper_ar/6).

Note that I also cannot reach o3 via http from my laptop (in VPN) or from backup-qam.qe.nue2.suse.org (which is in the neighboring rack). Only https works. I could reach o3 via http only from workers within the o3 network (e.g. arm1).

Looks like the NGINX config was modified around the time issues starting to come up:

Apr 24 09:33 /etc/nginx/vhosts.d/openqa.conf

Considering okurz pts/0 2a07:de40:b2bf:2 Wed Apr 24 09:33 still logged in it was maybe okurz who changed it. EDIT: It was this change: #133358#note-14

Actions

Copy link