Project

General

Profile

action #75055

Updated by nicksinger about 4 years ago

due to the ongoing v6 problems we realized that grenache-1 workers disappear one by one if there is no working ipv6 connectivity. This currently results in many blocked jobs since grenache-1 is our main jump host for more exotic testing environments. This ticket is mainly a tracker on what I did to make the workers appear in OSD again: 

 * 20.10.2020: problem was realized by workers not connecting to baremetal-support.qa.suse.de 
 * 20.10.2020: IPv6 route was missing, created https://infra.nue.suse.com/SelfService/Display.html?id=178626 
 * 20.10.2020: IPv6 route was manually added with `ip -6 r a fe80::1`, after that the worker appeared on all webui's again 
 * 21.10.2020: Due to severe performance problems with the workers we decided to remove the v6 route again (details: https://progress.opensuse.org/issues/73633?issue_count=67&issue_position=1&next_issue_id=73501#note-2) 
 * 22.10.2020: Several reports stated that grenache-1 workers are once again unavailable. Things I did: tried: 
   * Stopped all openqa-worker instances 
   * `umount /var/lib/openqa/share` since it was connected over ~~disable v6 
   * disable v6 completely on the external interface with `echo 1 > /proc/sys/net/ipv6/conf/eth0/disable_ipv6` /proc/sys/net/ipv6/conf/eth0/disable_ipv6`~~ -> worker remained offline 
   * `mount /var/lib/openqa/share && systemctl start openqa-worker@{1..40}` Added default route to fe80::1 manually again -> worker instantly came back online on OSD 

 Since we enabled v6 once again and given what we saw yesterday we can kind of expect slow uploads from grenache-1 now but at least it can do work at all

Back