Project

General

Profile

Actions

action #75055

closed

grenache-1 can't connect to webui's over IPv4 only

Added by nicksinger over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2020-10-22
Due date:
% Done:

0%

Estimated time:

Description

due to the ongoing v6 problems we realized that grenache-1 workers disappear one by one if there is no working ipv6 connectivity. This currently results in many blocked jobs since grenache-1 is our main jump host for more exotic testing environments. This ticket is mainly a tracker on what I did to make the workers appear in OSD again:

  • 20.10.2020: problem was realized by workers not connecting to baremetal-support.qa.suse.de
  • 20.10.2020: IPv6 route was missing, created https://infra.nue.suse.com/SelfService/Display.html?id=178626
  • 20.10.2020: IPv6 route was manually added with ip -6 r a fe80::1, after that the worker appeared on all webui's again
  • 21.10.2020: Due to severe performance problems with the workers we decided to remove the v6 route again (details: https://progress.opensuse.org/issues/73633?issue_count=67&issue_position=1&next_issue_id=73501#note-2)
  • 22.10.2020: Several reports stated that grenache-1 workers are once again unavailable. Things I did:
    • Stopped all openqa-worker instances
    • umount /var/lib/openqa/share since it was connected over v6
    • disable v6 completely on the external interface with echo 1 > /proc/sys/net/ipv6/conf/eth0/disable_ipv6
    • mount /var/lib/openqa/share && systemctl start openqa-worker@{1..40}
    • ==> workers came back on OSD. First jobs are running. Reducing priority for now :)

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet)Resolvednicksinger2020-10-202020-11-17

Actions
Related to openQA Infrastructure - action #75031: [Worker][IPMI] Two openQA workers become offline. openQA jobs stopped running.Resolvednicksinger2020-10-21

Actions
Actions #1

Updated by nicksinger over 3 years ago

  • Description updated (diff)
Actions #2

Updated by nicksinger over 3 years ago

  • Related to action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet) added
Actions #3

Updated by nicksinger over 3 years ago

  • Description updated (diff)
  • Priority changed from Urgent to Normal
Actions #4

Updated by okurz over 3 years ago

  • Assignee set to nicksinger
  • Target version set to Ready

with that I guess you can also set the ticket to "Blocked" waiting for EngInfra, isn't it?

Actions #5

Updated by nicksinger over 3 years ago

  • Related to action #75031: [Worker][IPMI] Two openQA workers become offline. openQA jobs stopped running. added
Actions #6

Updated by nicksinger over 3 years ago

  • Status changed from Feedback to Blocked

okurz wrote:

with that I guess you can also set the ticket to "Blocked" waiting for EngInfra, isn't it?

wanted to await some feedback on the performance but blocking it is fine too. If performance issues get reported I can set it to "workable" again anyway.

Actions #7

Updated by okurz over 3 years ago

  • Status changed from Blocked to Resolved

https://infra.nue.suse.com/SelfService/Display.html?id=178626 is "Resolved" as well as #73633 . I did a quick ssh malbec 'ping -c 1 -4 openqa.suse.de && ping -c 1 -6 openqa.suse.de' which was successful. This should be good as well.

Actions

Also available in: Atom PDF