Project

General

Profile

action #33700

[slenkins][qam] tcpd test fails in 2_tcpdmatch - hostnamectl/dns issue in slenkins tcpd testsuite

Added by thehejik over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2018-03-23
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-12-SP3-Server-DVD-Updates-x86_64-slenkins-twopence-tcpd-control@64bit fails in
slenkins_control

coolo: thehejik: https://openqa.suse.de/tests/1566506#step/2_tcpdmatch/1 - this looks like a problem with the openvswitch network. it started 2 weeks ago - but is not consistent. can you throw theories at the problem please? :)
thehejik: coolo: yes, vsvecova already reported, maybe it has something to do with https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4537 and mkravec told me that we shouldn't set fqdn hostname by hostnamectl but just hostname without domain so we need to investigate
coolo: thehejik: checking the salt commits - we did the openvswitch config 9 days before the problem started
thehejik: coolo: hopefully its not openvswitch related this time
coolo: thehejik: as our DNS setup was fixed, we should just revert these hacks
mkravec: coolo: I will do it

Reproducible

Fails since (at least) Build 20180323-1

Expected result

Last good: 20180321-3 (or more recent)

Further details

Always latest result in this scenario: latest

History

#1 Updated by mkravec over 3 years ago

DNS workaround disabled: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4688

We have similar random issue (etcd does not start for some reason) at CaaSP lately.

#2 Updated by pcervinka over 3 years ago

Maybe, similar failure in kdc-init https://openqa.suse.de/tests/1601875 ?

#3 Updated by okurz over 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: slenkins-twopence-krb5-control
https://openqa.suse.de/tests/1640966

#4 Updated by okurz over 3 years ago

  • Subject changed from tcpd test fails in 2_tcpdmatch - hostnamectl/dns issue in slenkins tcpd testsuite to [slenkins] tcpd test fails in 2_tcpdmatch - hostnamectl/dns issue in slenkins tcpd testsuite

thehejik do you plan to work on this yourself or what are your expectations?

#5 Updated by okurz over 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: slenkins-twopence-tcpd-control
https://openqa.suse.de/tests/1685527

#6 Updated by coolo over 3 years ago

  • Subject changed from [slenkins] tcpd test fails in 2_tcpdmatch - hostnamectl/dns issue in slenkins tcpd testsuite to [slenkins][qam] tcpd test fails in 2_tcpdmatch - hostnamectl/dns issue in slenkins tcpd testsuite

#7 Updated by okurz over 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: slenkins-twopence-krb5-control
https://openqa.suse.de/tests/1734779

#8 Updated by okurz over 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: slenkins-twopence-tcpd-control
https://openqa.suse.de/tests/1764603

#9 Updated by coolo over 3 years ago

  • Assignee set to thehejik
  • Priority changed from Normal to High

Ludwig helped me understand what is going on. The normal flow is:

  • server node boots
  • server node disables wicked
  • server node sets hostname as 'server'
  • support server starts a named
  • support server creates 'dns' lock
  • server node restarts network
  • server node queries dns as hostname 'server'
  • support server will resolve the Server IP as 'server' from then on

But what happens in the failing case is that
the server node boots while the support server already setup the named (classic race)
and then the dns server will resolve the Server IP as 'susetest' and the tests
fail.

-> The fix discussed was to create a barrier within the support server that marks
all slenkins nodes to have disabled their network and only after that start the
dhcp/dns server.

#10 Updated by thehejik over 3 years ago

Possible fix was created within https://progress.opensuse.org/issues/37258

#11 Updated by thehejik about 3 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF