Project

General

Profile

Actions

action #182498

open

openqaworker25 (o3) has no working IPv6 size:S

Added by nicksinger 15 days ago. Updated 2 days ago.

Status:
Workable
Priority:
Normal
Category:
Regressions/Crashes
Start date:
2025-05-15
Due date:
% Done:

0%

Estimated time:

Description

Observation

Richard made us aware in Slack that IPv6 is broken: https://openqa.opensuse.org/tests/5061938#step/wget_ipv6/9
I was first fooled by the above curl_ipv6-module that IPv6 must work on this machine while in reality it is never actually explicitly tested. Checking other tests like https://openqa.opensuse.org/tests/5061495 on that machine reveal indeed a broken v6 connection inside the SUT:

* Clear auth, redirects to port from 80 to 443
* Issue another request to this URL: 'https://doc.opensuse.org/release-notes/x86_64/openSUSE/Tumbleweed/RELEASE-NOTES.en.rtf'
* Host doc.opensuse.org:443 was resolved.
* IPv6: 2a07:de40:b27e:1204::10
* IPv4: 195.135.223.50
*   Trying [2a07:de40:b27e:1204::10]:443...
* connect to 2a07:de40:b27e:1204::10 port 443 from fec0::5054:ff:fe12:3456 port 53614 failed: Network is unreachable
*   Trying 195.135.223.50:443...
* ALPN: curl offers h2,http/1.1

openqaworker25 apparently also has (currently?) no working v6:

openqaworker25:~ # ping6 heise.de
ping6: connect: Network is unreachable

Acceptance criteria

  • AC1: All standard SUSE owned o3 workers can reach external systems over IPv6
  • AC2: openQA tests still work fine

Suggestions

Actions #1

Updated by okurz 15 days ago

  • Category set to Regressions/Crashes
  • Priority changed from Normal to Urgent
  • Target version set to Ready

I suggest as urgency mitigation to take affected machines out of production.

Is this a good opportuntiy to switch to NetworkManager as wicked is declared deprecated?

Actions #2

Updated by robert.richardson 14 days ago

  • Assignee set to robert.richardson
Actions #3

Updated by robert.richardson 14 days ago

  • Description updated (diff)
  • Priority changed from Urgent to High

okurz wrote in #note-1:

I suggest as urgency mitigation to take affected machines out of production.

I've connected to the openqaworker25 and called sudo systemctl disable --now $(systemctl list-units | grep openqa-worker-auto-restart | cut -d . -f 1 | xargs), also added according rollback steps to this ticket.

Actions #4

Updated by livdywan 14 days ago

A glance at next/previous reveals that this seems to affect other workers including worker20, 22-24, and 26-27 - maybe all of them?

Actions #5

Updated by robert.richardson 14 days ago · Edited

livdywan wrote in #note-4:

A glance at next/previous reveals that this seems to affect other workers including worker20, 22-24, and 26-27 - maybe all of them?

Yes, its actually all workers

rrichardson@ariel:~> for i in $hosts; do echo $i && ssh root@$i "ping6 heise.de"; done
openqaworker21
(root@openqaworker21) Password: 
ping6: connect: Network is unreachable
openqaworker22
(root@openqaworker22) Password: 
ping6: connect: Network is unreachable
openqaworker23
(root@openqaworker23) Password: 
ping6: connect: Network is unreachable
openqaworker24
(root@openqaworker24) Password: 
ping6: connect: Network is unreachable
openqaworker25
(root@openqaworker25) Password: 
ping6: connect: Network is unreachable
openqaworker26
(root@openqaworker26) Password: 
ping6: connect: Network is unreachable
openqaworker-arm21
(root@openqaworker-arm21) Password: 
ping6: connect: Network is unreachable
openqaworker-arm22
(root@openqaworker-arm22) Password: 
ping6: connect: Network is unreachable
qa-power8-3
(root@qa-power8-3) Password: 
ping6: connect: Network is unreachable

Edit:
I reverted first mitigation attempt and removed the rollback steps from the description.

Actions #6

Updated by robert.richardson 14 days ago

  • Description updated (diff)
Actions #7

Updated by livdywan 14 days ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/commits/master/tests/console/wget_ipv6.pm I guess we have a fix in the test scenario? @robert.richardson Please confirm and resolve if that's the case.

Actions #8

Updated by robert.richardson 14 days ago

livdywan wrote in #note-7:

https://github.com/os-autoinst/os-autoinst-distri-opensuse/commits/master/tests/console/wget_ipv6.pm I guess we have a fix in the test scenario?

yes looks like it was caused by 3de99d4 and reverted by now.

@robert.richardson Please confirm and resolve if that's the case.

o3 is having network issues atm, will schedule a test run and also manually check once i can access again.

Actions #9

Updated by robert.richardson 11 days ago

  • Status changed from New to Feedback

Waiting for o3 to be available again, which most likely will be wednesday, see
https://suse.slack.com/archives/C02AET1AAAD/p1747641089303019?thread_ts=1747407687.687679&cid=C02AET1AAAD

Actions #10

Updated by nicksinger 11 days ago

  • Status changed from Feedback to Blocked
Actions #11

Updated by nicksinger 8 days ago

Created https://sd.suse.com/servicedesk/customer/portal/1/SD-188768
If any information or debugging is needed, ipmitool can be used to access the machines while ariel is down: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls?ref_type=heads#L2601-2611

Actions #12

Updated by robert.richardson 4 days ago · Edited

Just had a quick look at the last couple runs of https://openqa.opensuse.org/tests/5061938#next_previous and although they did not fail at the wget_ipv6 step anymore, if i run wget -6 manually from a worker, it will still fail.

openqaworker21:/tmp # wget -O- -6 www3.zq1.de/test.txt
Prepended http:// to 'www3.zq1.de/test.txt'
--2025-05-26 11:27:57--  http://www3.zq1.de/test.txt
Resolving www3.zq1.de (www3.zq1.de)... 2a01:4f8:221:b52:fcfd:ff:fe00:ec0c
Connecting to www3.zq1.de (www3.zq1.de)|2a01:4f8:221:b52:fcfd:ff:fe00:ec0c|:80... failed: Network is unreachable.

Also ping6 on the individual workers will result in the same result as shown in #note-5

Actions #13

Updated by okurz 4 days ago

  • Status changed from Blocked to New

For sure this is not blocked anymore as referenced. So what do you plan?

Actions #14

Updated by nicksinger 4 days ago

true, but on https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 which I just updated. @robert.richardson would be nice if you could handle further checks regarding this SD ticket

Actions #15

Updated by nicksinger 3 days ago

  • Status changed from New to Workable

nicksinger wrote in #note-14:

true, but on https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 which I just updated. @robert.richardson would be nice if you could handle further checks regarding this SD ticket

https://sd.suse.com/servicedesk/customer/portal/1/SD-188768 is resolved. We need to set net.ipv6.conf.all.accept_ra = 2 on all hosts now. A good fit might be /etc/sysctl.d/ip_forward.conf where ipv6-forwarding is also enabled (the very thing that breaks the default of net.ipv6.conf.all.accept_ra = 1)

Actions #16

Updated by livdywan 3 days ago

  • Subject changed from openqaworker25 (o3) has no working IPv6 to openqaworker25 (o3) has no working IPv6 size:S
  • Description updated (diff)
Actions #17

Updated by okurz 2 days ago

  • Priority changed from High to Normal
Actions

Also available in: Atom PDF