Project

General

Profile

Actions

action #80128

closed

openqaworker-arm-2 fails to download from openqa

Added by coolo over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
-
Target version:
Start date:
2020-11-21
Due date:
% Done:

0%

Estimated time:

Description

https://openqa.suse.de/tests/5047755

Even with wget you can't download. I stopped the workers and the minion


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet)Resolvednicksinger2020-10-202020-11-17

Actions
Actions #1

Updated by okurz over 3 years ago

  • Related to action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet) added
Actions #2

Updated by okurz over 3 years ago

ping -6 openqa.suse.de does not work
that works on most other machines
sysctl net.ipv6.conf.eth1.accept_ra is ok as well, ip r does not show a default route though

Actions #3

Updated by okurz over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version set to Ready

ip r does not show a default IPv6 route, ip -6 r does though. After I did systemctl stop firewalld the ping openqa -c 1 -6 openqa.suse.de worked.

could not find anything in logs either, even if enabling in /etc/firewalld to log all "dropped" packages. Then later ping did not work even if firewalld disabled. I did systemctl restart network and my ssh connection never recovered. over IPMI SOL I can ping the own IPV6 address of openqaworker-arm-2 but not osd:

# ping -c 1 -6 2620:113:80c0:8080:10:160:0:227
PING 2620:113:80c0:8080:10:160:0:227(2620:113:80c0:8080:10:160:0:227) 56 data bytes
64 bytes from 2620:113:80c0:8080:10:160:0:227: icmp_seq=1 ttl=64 time=0.061 ms

--- 2620:113:80c0:8080:10:160:0:227 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.061/0.061/0.061/0.000 ms
e246:~ # ping -c 1 -6 openqa.suse.de
PING openqa.suse.de(openqa.suse.de (2620:113:80c0:8080:10:160:0:207)) 56 data bytes


which resolves the right address for openqa.suse.de at least. Then after some minutes the system would not even react properly over SOL. Triggered power reset.

EDIT: Same symptoms after reboot, then sudo systemctl disable --now openqa-worker.target openqa-worker-cacheservice openqa-worker-cacheservice-minion.service

salt -l error \* cmd.run 'dig openqa.suse.de AAAA' is fine on all machines.

salt -l error \* cmd.run 'ping -c 1 -4 openqa.suse.de ; ping -c 1 -6 openqa.suse.de' is not ok on QA-Power8-4-kvm.qa.suse.de, openqaworker-arm-1.suse.de, openqaworker-arm-2.suse.de

I have applied

salt -l error -L 'QA-Power8-4-kvm.qa.suse.de,openqaworker-arm-1.suse.de,openqaworker-arm-2.suse.de' cmd.run 'echo net.ipv6.conf.all.disable_ipv6 = 1 > /etc/sysctl.d/poo73633_poo80128_debugging.conf && sysctl --load /etc/sysctl.d/poo73633_poo80128_debugging.conf && systemctl restart openqa-worker@\* openqa-worker-cacheservice openqa-worker-cacheservice-minion.service os-autoinst-openvswitch.service && systemctl mask --now postfix'
Actions #4

Updated by okurz over 3 years ago

  • Status changed from In Progress to Feedback
Actions #5

Updated by okurz over 3 years ago

  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Priority changed from Normal to Low
  • Target version changed from Ready to future

this seems to have worked. So far the machines seem to be ok with only IPv4. I don't know why IPv6 does not work but I guess we can live with that for the time being.

Actions #6

Updated by nicksinger over 3 years ago

  • Status changed from Workable to Resolved
  • Assignee set to nicksinger

While working on grenache-1 I realized that there where some leftovers in sysctl on QA-Power8-4-kvm.qa.suse.de resulting in ipv6 being disabled only on some interfaces (IIRC "lo" was one of them). This resulted in these strange errors you discovered in #3 that resolving and all works but pings get stuck.
I removed your workaround file now and made sure all disable_ipv6 entries are set to 0 (on QA-Power8-4-kvm.qa.suse.de, openqaworker-arm-1.suse.de and openqaworker-arm-2.suse.de). To validate I ran the following command on osd:

openqa:~ # salt -l error --no-color -C 'G@roles:worker' cmd.run 'curl -s -6 openqa.suse.de | grep changelog'
openqaworker2.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
openqaworker5.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
openqaworker6.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
openqaworker8.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
QA-Power8-4-kvm.qa.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
openqaworker9.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
QA-Power8-5-kvm.qa.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
openqaworker13.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
openqaworker10.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
openqaworker-arm-2.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
openqaworker-arm-1.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>
grenache-1.qa.suse.de:
                          <a href="/changelog">4.6.1607440298.36d0dfbf9</a>

Please reopen if the problem still persists

Actions #7

Updated by okurz over 3 years ago

  • Target version changed from future to Ready
Actions

Also available in: Atom PDF