action #75055
closedgrenache-1 can't connect to webui's over IPv4 only
0%
Description
due to the ongoing v6 problems we realized that grenache-1 workers disappear one by one if there is no working ipv6 connectivity. This currently results in many blocked jobs since grenache-1 is our main jump host for more exotic testing environments. This ticket is mainly a tracker on what I did to make the workers appear in OSD again:
- 20.10.2020: problem was realized by workers not connecting to baremetal-support.qa.suse.de
- 20.10.2020: IPv6 route was missing, created https://infra.nue.suse.com/SelfService/Display.html?id=178626
- 20.10.2020: IPv6 route was manually added with
ip -6 r a fe80::1
, after that the worker appeared on all webui's again - 21.10.2020: Due to severe performance problems with the workers we decided to remove the v6 route again (details: https://progress.opensuse.org/issues/73633?issue_count=67&issue_position=1&next_issue_id=73501#note-2)
- 22.10.2020: Several reports stated that grenache-1 workers are once again unavailable. Things I did:
- Stopped all openqa-worker instances
umount /var/lib/openqa/share
since it was connected over v6- disable v6 completely on the external interface with
echo 1 > /proc/sys/net/ipv6/conf/eth0/disable_ipv6
mount /var/lib/openqa/share && systemctl start openqa-worker@{1..40}
- ==> workers came back on OSD. First jobs are running. Reducing priority for now :)
Updated by nicksinger almost 4 years ago
- Related to action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet) added
Updated by nicksinger almost 4 years ago
- Description updated (diff)
- Priority changed from Urgent to Normal
Updated by okurz almost 4 years ago
- Assignee set to nicksinger
- Target version set to Ready
with that I guess you can also set the ticket to "Blocked" waiting for EngInfra, isn't it?
Updated by nicksinger almost 4 years ago
- Related to action #75031: [Worker][IPMI] Two openQA workers become offline. openQA jobs stopped running. added
Updated by nicksinger almost 4 years ago
- Status changed from Feedback to Blocked
okurz wrote:
with that I guess you can also set the ticket to "Blocked" waiting for EngInfra, isn't it?
wanted to await some feedback on the performance but blocking it is fine too. If performance issues get reported I can set it to "workable" again anyway.
Updated by okurz almost 4 years ago
- Status changed from Blocked to Resolved
https://infra.nue.suse.com/SelfService/Display.html?id=178626 is "Resolved" as well as #73633 . I did a quick ssh malbec 'ping -c 1 -4 openqa.suse.de && ping -c 1 -6 openqa.suse.de'
which was successful. This should be good as well.