action #109241
closedopenQA Tests (public) - action #107062: Multiple failures due to network issues
Prefer to use domain names rather than IPv4 in salt pillars size:M
Added by okurz over 2 years ago. Updated almost 2 years ago.
0%
Description
Motivation¶
See #108845#note-33
There are variables called WORKER_HOSTNAME and SUT_IP and we often use IPv4 addresses even though we already use domain names in multiple cases. I think this is misleading and we should rename (but keep the old as deprecated fallback configuration option name).
Acceptance criteria¶
- AC1: WORKER_HOSTNAME and SUT_IP in salt pillars never rely on bare IP addresses
Suggestions¶
- Go through remaining IP's and check that they work after replacing them with hostnames
- No arbitrary IP's are being used in salt pillars
Updated by okurz over 2 years ago
- Copied from action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:M added
Updated by okurz over 2 years ago
- Status changed from In Progress to New
- Assignee deleted (
okurz) - Priority changed from Normal to Low
- Target version changed from Ready to future
Updated by okurz about 2 years ago
- Related to action #119443: Conduct the migration of SUSE openQA systems from Nbg SRV1 to new security zones size:M added
Updated by okurz about 2 years ago
- Priority changed from Low to Urgent
- Target version changed from future to Ready
This could be quite useful if we get this done before we continue with #119443
Updated by livdywan about 2 years ago
- Subject changed from Prefer to use domain names rather than IPv4 in salt pillars to Prefer to use domain names rather than IPv4 in salt pillars size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 2 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Updated by okurz about 2 years ago
I added the salt key for worker13.oqa.suse.de one of the recently migrated workers now. And now sudo salt --no-color --state-output=changes '*' grains.get fqdn
yields
storage.qa.suse.de:
storage
openqaworker2.suse.de:
openqaworker2.suse.de
openqaworker9.suse.de:
openqaworker9.suse.de
openqaworker8.suse.de:
openqaworker8.suse.de
openqaworker3.suse.de:
openqaworker3.suse.de
openqaworker14.qa.suse.cz:
openqaworker14.qa.suse.cz
qamasternue.qa.suse.de:
qamasternue.qa.suse.de
QA-Power8-5-kvm.qa.suse.de:
QA-Power8-5-kvm.qa.suse.de
baremetal-support:
baremetal-support.qa.suse.de
tumblesle:
tumblesle.qa.suse.de
openqaworker6.suse.de:
openqaworker6.suse.de
jenkins.qa.suse.de:
jenkins
openqaworker5.suse.de:
openqaworker5.suse.de
openqa.suse.de:
openqa.suse.de
schort-server:
schort-server.qa.suse.de
openqa-monitor.qa.suse.de:
monitor.qa.suse.de
malbec.arch.suse.de:
malbec.arch.suse.de
backup.qa.suse.de:
backup-vm
worker13.oqa.suse.de:
worker13
QA-Power8-4-kvm.qa.suse.de:
QA-Power8-4-kvm.qa.suse.de
grenache-1.qa.suse.de:
grenache-1.qa.suse.de
openqaworker-arm-1.suse.de:
openqaworker-arm-1.suse.de
openqaworker-arm-2.suse.de:
openqaworker-arm-2.suse.de
openqaworker-arm-3.suse.de:
openqaworker-arm-3.suse.de
so the FQDN of worker13 and also others is not really "full", yet. https://stackoverflow.com/a/54750233 recommends to avoid FQDN and in nearly all cases our salt ID is already equivalent to the FQDN so this might be a better choice. sudo salt --no-color --state-output=changes -C 'G@roles:worker' grains.get fqdn
returns a good match for all except openqaworker2 which is currently not with a working DNS entry and worker13 where hostname -f
does not return a match. Also sudo salt --no-color --state-output=changes -C 'G@roles:worker' cmd.run 'nslookup $(hostname)'
shows that workers can fully resolve themselves except for worker13, the mentioned problematic one.
hosts within the new domain .oqa.suse.de. should search for matches within that domain so that nslookup $(hostname)
works, e.g. nslookup worker13
should work. I assume that salt is relying on that to return a proper match for grains.fqdn
Updated by okurz about 2 years ago
Updated by okurz about 2 years ago
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/766 merged. https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/451 for related cleanup in pillars and also using FQDN for SUT_IP and VIRSH_GUEST.
Updated by okurz about 2 years ago
- Due date set to 2022-11-18
- Status changed from In Progress to Feedback
Updated by okurz about 2 years ago
- Related to action #120079: test fails in ibft as it is expecting an IPv4 address where a hostname is provided after #109241 added
Updated by okurz about 2 years ago
There are some problems on s390x kvm tests, e.g. can't find login prompt in https://openqa.suse.de/tests/9900527 or unable to connect to VNC like in https://openqa.suse.de/tests/9900606#step/installation/26 . I removed worker2 from salt and did:
for i in $(sed -n 's/VIRSH_GUEST=//p' /etc/openqa/workers.ini | sort | uniq | grep 's390kvm'); do echo $i && sed -i "s@$i@$(host $i | sed -n 's/^.*has address //p')@" /etc/openqa/workers.ini; done
and retriggered some tests, let's see, e.g. https://openqa.suse.de/tests/9901355
EDIT: Tests passed the installation and continue fine so the hostname is apparently a problem.
I changed back only on instance 32 on worker2 the SUT_IP to hostname, keeping VIRSH_GUEST on the IPv4 address and added a special worker class
openqa-clone-job --within-instance https://openqa.suse.de/tests/9901355 _GROUP=0 BUILD= TEST=qam-gnome-test-okurz-poo109241 WORKER_CLASS=debug_okurz_poo109241
Created job #9901761: sle-12-SP4-Server-DVD-Updates-s390x-Build20221107-1-qam-gnome@s390x-kvm-sle12 -> https://openqa.suse.de/t9901761
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/455
Updated by okurz about 2 years ago
- Copied to action #120169: Make s390x kvm workers also use FQDN instead of IPv4 in salt pillars for VIRSH_GUEST added
Updated by okurz about 2 years ago
- Due date deleted (
2022-11-18) - Status changed from Feedback to Resolved
Tests are good again. The aforementioned problem only concerns VIRSH_GUEST, see #120169 for that. WORKER_HOSTNAME and SUT_IP are now fully using FQDN.
Updated by okurz about 2 years ago
- Copied to action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow added
Updated by waynechen55 almost 2 years ago
@okurz Would you help have a look at this issue ? Thanks
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/481