Project

General

Profile

Actions

action #109241

closed

openQA Tests - action #107062: Multiple failures due to network issues

Prefer to use domain names rather than IPv4 in salt pillars size:M

Added by okurz about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See #108845#note-33

There are variables called WORKER_HOSTNAME and SUT_IP and we often use IPv4 addresses even though we already use domain names in multiple cases. I think this is misleading and we should rename (but keep the old as deprecated fallback configuration option name).

Acceptance criteria

  • AC1: WORKER_HOSTNAME and SUT_IP in salt pillars never rely on bare IP addresses

Suggestions

  • Go through remaining IP's and check that they work after replacing them with hostnames
  • No arbitrary IP's are being used in salt pillars

Related issues 5 (1 open4 closed)

Related to QA - action #119443: Conduct the migration of SUSE openQA systems from Nbg SRV1 to new security zones size:MResolvedokurz2022-11-17

Actions
Related to qe-yam - action #120079: test fails in ibft as it is expecting an IPv4 address where a hostname is provided after #109241Resolvedgeor2022-11-08

Actions
Copied from openQA Infrastructure - action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:MResolvednicksinger2022-03-24

Actions
Copied to openQA Infrastructure - action #120169: Make s390x kvm workers also use FQDN instead of IPv4 in salt pillars for VIRSH_GUESTNew2022-11-09

Actions
Copied to openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowResolvedmkittler2022-11-10

Actions
Actions #1

Updated by okurz about 2 years ago

  • Copied from action #108845: Network performance problems, DNS, DHCP, within SUSE QA network auto_review:"(Error connecting to VNC server.*qa.suse.*Connection timed out|ipmitool.*qa.suse.*Unable to establish)":retry but also other symptoms size:M added
Actions #2

Updated by okurz about 2 years ago

  • Status changed from In Progress to New
  • Assignee deleted (okurz)
  • Priority changed from Normal to Low
  • Target version changed from Ready to future
Actions #3

Updated by okurz about 2 years ago

  • Description updated (diff)
Actions #4

Updated by okurz over 1 year ago

  • Related to action #119443: Conduct the migration of SUSE openQA systems from Nbg SRV1 to new security zones size:M added
Actions #5

Updated by okurz over 1 year ago

  • Priority changed from Low to Urgent
  • Target version changed from future to Ready

This could be quite useful if we get this done before we continue with #119443

Actions #6

Updated by okurz over 1 year ago

  • Description updated (diff)
Actions #7

Updated by livdywan over 1 year ago

  • Subject changed from Prefer to use domain names rather than IPv4 in salt pillars to Prefer to use domain names rather than IPv4 in salt pillars size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #8

Updated by okurz over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz
Actions #9

Updated by okurz over 1 year ago

I added the salt key for worker13.oqa.suse.de one of the recently migrated workers now. And now sudo salt --no-color --state-output=changes '*' grains.get fqdn yields

storage.qa.suse.de:
    storage
openqaworker2.suse.de:
    openqaworker2.suse.de
openqaworker9.suse.de:
    openqaworker9.suse.de
openqaworker8.suse.de:
    openqaworker8.suse.de
openqaworker3.suse.de:
    openqaworker3.suse.de
openqaworker14.qa.suse.cz:
    openqaworker14.qa.suse.cz
qamasternue.qa.suse.de:
    qamasternue.qa.suse.de
QA-Power8-5-kvm.qa.suse.de:
    QA-Power8-5-kvm.qa.suse.de
baremetal-support:
    baremetal-support.qa.suse.de
tumblesle:
    tumblesle.qa.suse.de
openqaworker6.suse.de:
    openqaworker6.suse.de
jenkins.qa.suse.de:
    jenkins
openqaworker5.suse.de:
    openqaworker5.suse.de
openqa.suse.de:
    openqa.suse.de
schort-server:
    schort-server.qa.suse.de
openqa-monitor.qa.suse.de:
    monitor.qa.suse.de
malbec.arch.suse.de:
    malbec.arch.suse.de
backup.qa.suse.de:
    backup-vm
worker13.oqa.suse.de:
    worker13
QA-Power8-4-kvm.qa.suse.de:
    QA-Power8-4-kvm.qa.suse.de
grenache-1.qa.suse.de:
    grenache-1.qa.suse.de
openqaworker-arm-1.suse.de:
    openqaworker-arm-1.suse.de
openqaworker-arm-2.suse.de:
    openqaworker-arm-2.suse.de
openqaworker-arm-3.suse.de:
    openqaworker-arm-3.suse.de

so the FQDN of worker13 and also others is not really "full", yet. https://stackoverflow.com/a/54750233 recommends to avoid FQDN and in nearly all cases our salt ID is already equivalent to the FQDN so this might be a better choice. sudo salt --no-color --state-output=changes -C 'G@roles:worker' grains.get fqdn returns a good match for all except openqaworker2 which is currently not with a working DNS entry and worker13 where hostname -f does not return a match. Also sudo salt --no-color --state-output=changes -C 'G@roles:worker' cmd.run 'nslookup $(hostname)' shows that workers can fully resolve themselves except for worker13, the mentioned problematic one.

hosts within the new domain .oqa.suse.de. should search for matches within that domain so that nslookup $(hostname) works, e.g. nslookup worker13 should work. I assume that salt is relying on that to return a proper match for grains.fqdn

Actions #11

Updated by okurz over 1 year ago

Actions #12

Updated by okurz over 1 year ago

  • Due date set to 2022-11-18
  • Status changed from In Progress to Feedback
Actions #13

Updated by okurz over 1 year ago

  • Related to action #120079: test fails in ibft as it is expecting an IPv4 address where a hostname is provided after #109241 added
Actions #14

Updated by okurz over 1 year ago

There are some problems on s390x kvm tests, e.g. can't find login prompt in https://openqa.suse.de/tests/9900527 or unable to connect to VNC like in https://openqa.suse.de/tests/9900606#step/installation/26 . I removed worker2 from salt and did:

for i in $(sed -n 's/VIRSH_GUEST=//p' /etc/openqa/workers.ini | sort | uniq | grep 's390kvm'); do echo $i && sed -i "s@$i@$(host $i | sed -n 's/^.*has address //p')@" /etc/openqa/workers.ini; done

and retriggered some tests, let's see, e.g. https://openqa.suse.de/tests/9901355

EDIT: Tests passed the installation and continue fine so the hostname is apparently a problem.

I changed back only on instance 32 on worker2 the SUT_IP to hostname, keeping VIRSH_GUEST on the IPv4 address and added a special worker class
openqa-clone-job --within-instance https://openqa.suse.de/tests/9901355 _GROUP=0 BUILD= TEST=qam-gnome-test-okurz-poo109241 WORKER_CLASS=debug_okurz_poo109241
Created job #9901761: sle-12-SP4-Server-DVD-Updates-s390x-Build20221107-1-qam-gnome@s390x-kvm-sle12 -> https://openqa.suse.de/t9901761

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/455

Actions #15

Updated by okurz over 1 year ago

  • Copied to action #120169: Make s390x kvm workers also use FQDN instead of IPv4 in salt pillars for VIRSH_GUEST added
Actions #16

Updated by okurz over 1 year ago

  • Due date deleted (2022-11-18)
  • Status changed from Feedback to Resolved

Tests are good again. The aforementioned problem only concerns VIRSH_GUEST, see #120169 for that. WORKER_HOSTNAME and SUT_IP are now fully using FQDN.

Actions #17

Updated by okurz over 1 year ago

  • Copied to action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow added
Actions #18

Updated by waynechen55 over 1 year ago

@okurz Would you help have a look at this issue ? Thanks
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/481

Actions

Also available in: Atom PDF