action #182498: openqaworker25 (o3) has no working IPv6 size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #182498

open

openqaworker25 (o3) has no working IPv6 size:S

Added by nicksinger 15 days ago. Updated 2 days ago.

Status:

Workable

Priority:

Normal

Assignee:

robert.richardson

Category:

Regressions/Crashes

Target version:

openQA Project (public) - Ready

Start date:

2025-05-15

Due date:

% Done:

Estimated time:

Tags:

infra, reactive work

Description

Observation¶

Richard made us aware in Slack that IPv6 is broken: https://openqa.opensuse.org/tests/5061938#step/wget_ipv6/9
I was first fooled by the above curl_ipv6-module that IPv6 must work on this machine while in reality it is never actually explicitly tested. Checking other tests like https://openqa.opensuse.org/tests/5061495 on that machine reveal indeed a broken v6 connection inside the SUT:

* Clear auth, redirects to port from 80 to 443
* Issue another request to this URL: 'https://doc.opensuse.org/release-notes/x86_64/openSUSE/Tumbleweed/RELEASE-NOTES.en.rtf'
* Host doc.opensuse.org:443 was resolved.
* IPv6: 2a07:de40:b27e:1204::10
* IPv4: 195.135.223.50
*   Trying [2a07:de40:b27e:1204::10]:443...
* connect to 2a07:de40:b27e:1204::10 port 443 from fec0::5054:ff:fe12:3456 port 53614 failed: Network is unreachable
*   Trying 195.135.223.50:443...
* ALPN: curl offers h2,http/1.1

openqaworker25 apparently also has (currently?) no working v6:

openqaworker25:~ # ping6 heise.de
ping6: connect: Network is unreachable

Acceptance criteria¶

AC1: All standard SUSE owned o3 workers can reach external systems over IPv6
AC2: openQA tests still work fine

Suggestions¶

Set net.ipv6.conf.all.accept_ra = 2 on all o3 worker machines
Consider using the already present /etc/sysctl.d/ip_forward.conf
Compare to https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/network/accept_ra.sls#L1

Actions

Copy link

Updated by okurz 15 days ago

Category set to Regressions/Crashes
Priority changed from Normal to Urgent
Target version set to Ready

I suggest as urgency mitigation to take affected machines out of production.

Is this a good opportuntiy to switch to NetworkManager as wicked is declared deprecated?

Actions

Copy link

Updated by robert.richardson 14 days ago

Assignee set to robert.richardson

Actions

Copy link

Updated by robert.richardson 14 days ago

Description updated (diff)
Priority changed from Urgent to High

okurz wrote in #note-1:

I suggest as urgency mitigation to take affected machines out of production.

I've connected to the openqaworker25 and called sudo systemctl disable --now $(systemctl list-units | grep openqa-worker-auto-restart | cut -d . -f 1 | xargs), also added according rollback steps to this ticket.

Actions

Copy link

Updated by livdywan 14 days ago

A glance at next/previous reveals that this seems to affect other workers including worker20, 22-24, and 26-27 - maybe all of them?

Actions

Copy link

Updated by robert.richardson 14 days ago · Edited

livdywan wrote in #note-4:

A glance at next/previous reveals that this seems to affect other workers including worker20, 22-24, and 26-27 - maybe all of them?

Yes, its actually all workers

rrichardson@ariel:~> for i in $hosts; do echo $i && ssh root@$i "ping6 heise.de"; done
openqaworker21
(root@openqaworker21) Password: 
ping6: connect: Network is unreachable
openqaworker22
(root@openqaworker22) Password: 
ping6: connect: Network is unreachable
openqaworker23
(root@openqaworker23) Password: 
ping6: connect: Network is unreachable
openqaworker24
(root@openqaworker24) Password: 
ping6: connect: Network is unreachable
openqaworker25
(root@openqaworker25) Password: 
ping6: connect: Network is unreachable
openqaworker26
(root@openqaworker26) Password: 
ping6: connect: Network is unreachable
openqaworker-arm21
(root@openqaworker-arm21) Password: 
ping6: connect: Network is unreachable
openqaworker-arm22
(root@openqaworker-arm22) Password: 
ping6: connect: Network is unreachable
qa-power8-3
(root@qa-power8-3) Password: 
ping6: connect: Network is unreachable

Edit:
I reverted first mitigation attempt and removed the rollback steps from the description.

Actions

Copy link

Updated by robert.richardson 14 days ago

Description updated (diff)

Actions

Copy link

Updated by livdywan 14 days ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/commits/master/tests/console/wget_ipv6.pm I guess we have a fix in the test scenario? @robert.richardson Please confirm and resolve if that's the case.

Actions

Copy link

Updated by robert.richardson 14 days ago

livdywan wrote in #note-7:

https://github.com/os-autoinst/os-autoinst-distri-opensuse/commits/master/tests/console/wget_ipv6.pm I guess we have a fix in the test scenario?

yes looks like it was caused by 3de99d4 and reverted by now.

@robert.richardson Please confirm and resolve if that's the case.

o3 is having network issues atm, will schedule a test run and also manually check once i can access again.

Actions

Copy link

Updated by robert.richardson 11 days ago

Status changed from New to Feedback

Waiting for o3 to be available again, which most likely will be wednesday, see
https://suse.slack.com/archives/C02AET1AAAD/p1747641089303019?thread_ts=1747407687.687679&cid=C02AET1AAAD

Actions

Copy link

#10

Updated by nicksinger 11 days ago

Status changed from Feedback to Blocked

This is blocked by https://suse.slack.com/archives/C02AET1AAAD/p1747641089303019?thread_ts=1747407687.687679&cid=C02AET1AAAD and not waiting for any feedback :)

Actions

Copy link

#11

Updated by nicksinger 8 days ago

Created https://sd.suse.com/servicedesk/customer/portal/1/SD-188768
If any information or debugging is needed, ipmitool can be used to access the machines while ariel is down: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls?ref_type=heads#L2601-2611

Actions

Copy link

#12

Updated by robert.richardson 4 days ago · Edited

Just had a quick look at the last couple runs of https://openqa.opensuse.org/tests/5061938#next_previous and although they did not fail at the wget_ipv6 step anymore, if i run wget -6 manually from a worker, it will still fail.

openqaworker21:/tmp # wget -O- -6 www3.zq1.de/test.txt
Prepended http:// to 'www3.zq1.de/test.txt'
--2025-05-26 11:27:57--  http://www3.zq1.de/test.txt
Resolving www3.zq1.de (www3.zq1.de)... 2a01:4f8:221:b52:fcfd:ff:fe00:ec0c
Connecting to www3.zq1.de (www3.zq1.de)|2a01:4f8:221:b52:fcfd:ff:fe00:ec0c|:80... failed: Network is unreachable.

Also ping6 on the individual workers will result in the same result as shown in #note-5

Actions

Copy link

#13

Updated by okurz 4 days ago

Status changed from Blocked to New

For sure this is not blocked anymore as referenced. So what do you plan?

Actions

Copy link

#14

Updated by nicksinger 4 days ago

true, but on https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 which I just updated. @robert.richardson would be nice if you could handle further checks regarding this SD ticket

Actions

Copy link

#15

Updated by nicksinger 3 days ago

Status changed from New to Workable

nicksinger wrote in #note-14:

true, but on https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 which I just updated. @robert.richardson would be nice if you could handle further checks regarding this SD ticket

https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 is resolved. We need to set net.ipv6.conf.all.accept_ra = 2 on all hosts now. A good fit might be /etc/sysctl.d/ip_forward.conf where ipv6-forwarding is also enabled (the very thing that breaks the default of net.ipv6.conf.all.accept_ra = 1)

Actions

Copy link

#16

Updated by livdywan 3 days ago

Subject changed from openqaworker25 (o3) has no working IPv6 to openqaworker25 (o3) has no working IPv6 size:S
Description updated (diff)

Actions

Copy link

#17

Updated by okurz 2 days ago

Priority changed from High to Normal

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #182498

openqaworker25 (o3) has no working IPv6 size:S

Observation¶

Acceptance criteria¶

Suggestions¶

Updated by okurz 15 days ago

Updated by robert.richardson 14 days ago

Updated by robert.richardson 14 days ago

Updated by livdywan 14 days ago

Updated by robert.richardson 14 days ago · Edited

Updated by robert.richardson 14 days ago

Updated by livdywan 14 days ago

Updated by robert.richardson 14 days ago

Updated by robert.richardson 11 days ago

Updated by nicksinger 11 days ago

Updated by nicksinger 8 days ago

Updated by robert.richardson 4 days ago · Edited

Updated by okurz 4 days ago

Updated by nicksinger 4 days ago

Updated by nicksinger 3 days ago

Updated by livdywan 3 days ago

Updated by okurz 2 days ago