Project

General

Profile

Actions

action #75055

closed

grenache-1 can't connect to webui's over IPv4 only

Added by nicksinger over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2020-10-22
Due date:
% Done:

0%

Estimated time:

Description

due to the ongoing v6 problems we realized that grenache-1 workers disappear one by one if there is no working ipv6 connectivity. This currently results in many blocked jobs since grenache-1 is our main jump host for more exotic testing environments. This ticket is mainly a tracker on what I did to make the workers appear in OSD again:

  • 20.10.2020: problem was realized by workers not connecting to baremetal-support.qa.suse.de
  • 20.10.2020: IPv6 route was missing, created https://infra.nue.suse.com/SelfService/Display.html?id=178626
  • 20.10.2020: IPv6 route was manually added with ip -6 r a fe80::1, after that the worker appeared on all webui's again
  • 21.10.2020: Due to severe performance problems with the workers we decided to remove the v6 route again (details: https://progress.opensuse.org/issues/73633?issue_count=67&issue_position=1&next_issue_id=73501#note-2)
  • 22.10.2020: Several reports stated that grenache-1 workers are once again unavailable. Things I did:
    • Stopped all openqa-worker instances
    • umount /var/lib/openqa/share since it was connected over v6
    • disable v6 completely on the external interface with echo 1 > /proc/sys/net/ipv6/conf/eth0/disable_ipv6
    • mount /var/lib/openqa/share && systemctl start openqa-worker@{1..40}
    • ==> workers came back on OSD. First jobs are running. Reducing priority for now :)

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet)Resolvednicksinger2020-10-202020-11-17

Actions
Related to openQA Infrastructure - action #75031: [Worker][IPMI] Two openQA workers become offline. openQA jobs stopped running.Resolvednicksinger2020-10-21

Actions
Actions

Also available in: Atom PDF