action #182498
openopenqaworker25 (o3) has no working IPv6 size:S
0%
Description
Observation¶
Richard made us aware in Slack that IPv6 is broken: https://openqa.opensuse.org/tests/5061938#step/wget_ipv6/9
I was first fooled by the above curl_ipv6
-module that IPv6 must work on this machine while in reality it is never actually explicitly tested. Checking other tests like https://openqa.opensuse.org/tests/5061495 on that machine reveal indeed a broken v6 connection inside the SUT:
* Clear auth, redirects to port from 80 to 443
* Issue another request to this URL: 'https://doc.opensuse.org/release-notes/x86_64/openSUSE/Tumbleweed/RELEASE-NOTES.en.rtf'
* Host doc.opensuse.org:443 was resolved.
* IPv6: 2a07:de40:b27e:1204::10
* IPv4: 195.135.223.50
* Trying [2a07:de40:b27e:1204::10]:443...
* connect to 2a07:de40:b27e:1204::10 port 443 from fec0::5054:ff:fe12:3456 port 53614 failed: Network is unreachable
* Trying 195.135.223.50:443...
* ALPN: curl offers h2,http/1.1
openqaworker25 apparently also has (currently?) no working v6:
openqaworker25:~ # ping6 heise.de
ping6: connect: Network is unreachable
Acceptance criteria¶
- AC1: All standard SUSE owned o3 workers can reach external systems over IPv6
- AC2: openQA tests still work fine
Suggestions¶
- Set
net.ipv6.conf.all.accept_ra = 2
on all o3 worker machines - Consider using the already present
/etc/sysctl.d/ip_forward.conf
- Compare to https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/network/accept_ra.sls#L1
Updated by robert.richardson 14 days ago
- Description updated (diff)
- Priority changed from Urgent to High
okurz wrote in #note-1:
I suggest as urgency mitigation to take affected machines out of production.
I've connected to the openqaworker25 and called sudo systemctl disable --now $(systemctl list-units | grep openqa-worker-auto-restart | cut -d . -f 1 | xargs)
, also added according rollback steps to this ticket.
Updated by livdywan 14 days ago
A glance at next/previous reveals that this seems to affect other workers including worker20, 22-24, and 26-27 - maybe all of them?
Updated by robert.richardson 14 days ago · Edited
livdywan wrote in #note-4:
A glance at next/previous reveals that this seems to affect other workers including worker20, 22-24, and 26-27 - maybe all of them?
Yes, its actually all workers
rrichardson@ariel:~> for i in $hosts; do echo $i && ssh root@$i "ping6 heise.de"; done
openqaworker21
(root@openqaworker21) Password:
ping6: connect: Network is unreachable
openqaworker22
(root@openqaworker22) Password:
ping6: connect: Network is unreachable
openqaworker23
(root@openqaworker23) Password:
ping6: connect: Network is unreachable
openqaworker24
(root@openqaworker24) Password:
ping6: connect: Network is unreachable
openqaworker25
(root@openqaworker25) Password:
ping6: connect: Network is unreachable
openqaworker26
(root@openqaworker26) Password:
ping6: connect: Network is unreachable
openqaworker-arm21
(root@openqaworker-arm21) Password:
ping6: connect: Network is unreachable
openqaworker-arm22
(root@openqaworker-arm22) Password:
ping6: connect: Network is unreachable
qa-power8-3
(root@qa-power8-3) Password:
ping6: connect: Network is unreachable
Edit:
I reverted first mitigation attempt and removed the rollback steps from the description.
Updated by livdywan 14 days ago
https://github.com/os-autoinst/os-autoinst-distri-opensuse/commits/master/tests/console/wget_ipv6.pm I guess we have a fix in the test scenario? @robert.richardson Please confirm and resolve if that's the case.
Updated by robert.richardson 14 days ago
livdywan wrote in #note-7:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/commits/master/tests/console/wget_ipv6.pm I guess we have a fix in the test scenario?
yes looks like it was caused by 3de99d4 and reverted by now.
@robert.richardson Please confirm and resolve if that's the case.
o3 is having network issues atm, will schedule a test run and also manually check once i can access again.
Updated by robert.richardson 11 days ago
- Status changed from New to Feedback
Waiting for o3 to be available again, which most likely will be wednesday, see
https://suse.slack.com/archives/C02AET1AAAD/p1747641089303019?thread_ts=1747407687.687679&cid=C02AET1AAAD
Updated by nicksinger 11 days ago
- Status changed from Feedback to Blocked
This is blocked by https://suse.slack.com/archives/C02AET1AAAD/p1747641089303019?thread_ts=1747407687.687679&cid=C02AET1AAAD and not waiting for any feedback :)
Updated by nicksinger 8 days ago
Created https://sd.suse.com/servicedesk/customer/portal/1/SD-188768
If any information or debugging is needed, ipmitool can be used to access the machines while ariel is down: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls?ref_type=heads#L2601-2611
Updated by robert.richardson 4 days ago · Edited
Just had a quick look at the last couple runs of https://openqa.opensuse.org/tests/5061938#next_previous and although they did not fail at the wget_ipv6
step anymore, if i run wget -6
manually from a worker, it will still fail.
openqaworker21:/tmp # wget -O- -6 www3.zq1.de/test.txt
Prepended http:// to 'www3.zq1.de/test.txt'
--2025-05-26 11:27:57-- http://www3.zq1.de/test.txt
Resolving www3.zq1.de (www3.zq1.de)... 2a01:4f8:221:b52:fcfd:ff:fe00:ec0c
Connecting to www3.zq1.de (www3.zq1.de)|2a01:4f8:221:b52:fcfd:ff:fe00:ec0c|:80... failed: Network is unreachable.
Also ping6
on the individual workers will result in the same result as shown in #note-5
Updated by nicksinger 4 days ago
true, but on https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 which I just updated. @robert.richardson would be nice if you could handle further checks regarding this SD ticket
Updated by nicksinger 3 days ago
- Status changed from New to Workable
nicksinger wrote in #note-14:
true, but on https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 which I just updated. @robert.richardson would be nice if you could handle further checks regarding this SD ticket
https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 is resolved. We need to set net.ipv6.conf.all.accept_ra = 2
on all hosts now. A good fit might be /etc/sysctl.d/ip_forward.conf
where ipv6-forwarding is also enabled (the very thing that breaks the default of net.ipv6.conf.all.accept_ra = 1
)