action #109241
openQA Tests - action #107062: Multiple failures due to network issues
Prefer to use domain names rather than IPv4 in salt pillars size:M
0%
Description
Motivation¶
See #108845#note-33
There are variables called WORKER_HOSTNAME and SUT_IP and we often use IPv4 addresses even though we already use domain names in multiple cases. I think this is misleading and we should rename (but keep the old as deprecated fallback configuration option name).
Acceptance criteria¶
- AC1: WORKER_HOSTNAME and SUT_IP in salt pillars never rely on bare IP addresses
Suggestions¶
- Go through remaining IP's and check that they work after replacing them with hostnames
- No arbitrary IP's are being used in salt pillars
Related issues
History
#1
Updated by okurz 10 months ago
- Copied from action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:M added
#9
Updated by okurz 3 months ago
I added the salt key for worker13.oqa.suse.de one of the recently migrated workers now. And now sudo salt --no-color --state-output=changes '*' grains.get fqdn
yields
storage.qa.suse.de: storage openqaworker2.suse.de: openqaworker2.suse.de openqaworker9.suse.de: openqaworker9.suse.de openqaworker8.suse.de: openqaworker8.suse.de openqaworker3.suse.de: openqaworker3.suse.de openqaworker14.qa.suse.cz: openqaworker14.qa.suse.cz qamasternue.qa.suse.de: qamasternue.qa.suse.de QA-Power8-5-kvm.qa.suse.de: QA-Power8-5-kvm.qa.suse.de baremetal-support: baremetal-support.qa.suse.de tumblesle: tumblesle.qa.suse.de openqaworker6.suse.de: openqaworker6.suse.de jenkins.qa.suse.de: jenkins openqaworker5.suse.de: openqaworker5.suse.de openqa.suse.de: openqa.suse.de schort-server: schort-server.qa.suse.de openqa-monitor.qa.suse.de: monitor.qa.suse.de malbec.arch.suse.de: malbec.arch.suse.de backup.qa.suse.de: backup-vm worker13.oqa.suse.de: worker13 QA-Power8-4-kvm.qa.suse.de: QA-Power8-4-kvm.qa.suse.de grenache-1.qa.suse.de: grenache-1.qa.suse.de openqaworker-arm-1.suse.de: openqaworker-arm-1.suse.de openqaworker-arm-2.suse.de: openqaworker-arm-2.suse.de openqaworker-arm-3.suse.de: openqaworker-arm-3.suse.de
so the FQDN of worker13 and also others is not really "full", yet. https://stackoverflow.com/a/54750233 recommends to avoid FQDN and in nearly all cases our salt ID is already equivalent to the FQDN so this might be a better choice. sudo salt --no-color --state-output=changes -C 'G@roles:worker' grains.get fqdn
returns a good match for all except openqaworker2 which is currently not with a working DNS entry and worker13 where hostname -f
does not return a match. Also sudo salt --no-color --state-output=changes -C 'G@roles:worker' cmd.run 'nslookup $(hostname)'
shows that workers can fully resolve themselves except for worker13, the mentioned problematic one.
hosts within the new domain .oqa.suse.de. should search for matches within that domain so that nslookup $(hostname)
works, e.g. nslookup worker13
should work. I assume that salt is relying on that to return a proper match for grains.fqdn
#11
Updated by okurz 3 months ago
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/766 merged. https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/451 for related cleanup in pillars and also using FQDN for SUT_IP and VIRSH_GUEST.
#13
Updated by okurz 3 months ago
- Related to action #120079: test fails in ibft as it is expecting an IPv4 address where a hostname is provided after #109241 added
#14
Updated by okurz 3 months ago
There are some problems on s390x kvm tests, e.g. can't find login prompt in https://openqa.suse.de/tests/9900527 or unable to connect to VNC like in https://openqa.suse.de/tests/9900606#step/installation/26 . I removed worker2 from salt and did:
for i in $(sed -n 's/VIRSH_GUEST=//p' /etc/openqa/workers.ini | sort | uniq | grep 's390kvm'); do echo $i && sed -i "s@$i@$(host $i | sed -n 's/^.*has address //p')@" /etc/openqa/workers.ini; done
and retriggered some tests, let's see, e.g. https://openqa.suse.de/tests/9901355
EDIT: Tests passed the installation and continue fine so the hostname is apparently a problem.
I changed back only on instance 32 on worker2 the SUT_IP to hostname, keeping VIRSH_GUEST on the IPv4 address and added a special worker class
openqa-clone-job --within-instance https://openqa.suse.de/tests/9901355 _GROUP=0 BUILD= TEST=qam-gnome-test-okurz-poo109241 WORKER_CLASS=debug_okurz_poo109241
Created job #9901761: sle-12-SP4-Server-DVD-Updates-s390x-Build20221107-1-qam-gnome@s390x-kvm-sle12 -> https://openqa.suse.de/t9901761
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/455
#15
Updated by okurz 3 months ago
- Copied to action #120169: Make s390x kvm workers also use FQDN instead of IPv4 in salt pillars for VIRSH_GUEST added
#17
Updated by okurz 3 months ago
- Copied to action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow added
#18
Updated by waynechen55 17 days ago
okurz Would you help have a look at this issue ? Thanks
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/481