Project

General

Profile

Actions

action #116770

closed

[qe-core][functional]test fails in prepare_test_data, vm can't get dhcp4 ip addr

Added by rfan1 over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2022-09-19
Due date:
% Done:

0%

Estimated time:
Difficulty:
Sprint:
QE-Core: October Sprint (Sep 28 - Oct 26)

Description

Observation

openQA test in scenario sle-15-SP5-Online-x86_64-extra_tests_textmode_mod_desktop@svirt-xen-hvm fails in
prepare_test_data

Test suite description

Maintainer: dheidler. Extra tests about CLI software in desktop applications module

Reproducible

Fails since (at least) Build 21.1 (current job)

Expected result

Last good: 19.1 (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by rfan1 over 1 year ago

We can see no dhcp4 ip addr is assigned for the vm, https://openqa.suse.de/tests/9527550#step/boot_to_desktop/4

We can get the macaddr in https://openqa.suse.de/tests/9527550/logfile?filename=autoinst-log.txt.

Is any problem with the dhcp server? do we use up the ipv4 address? or is the ip addr used by another vm?

Can any experts take a look at this issue?

Actions #2

Updated by rfan1 over 1 year ago

  • Subject changed from [qe-core][qem]test fails in prepare_test_data, vm can't get dhcp4 ip addr to [qe-core][functional]test fails in prepare_test_data, vm can't get dhcp4 ip addr
Actions #3

Updated by rfan1 over 1 year ago

From https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls

  9:
    WORKER_CLASS: svirt-xen
    VIRSH_HOSTNAME: openqaw5-xen.qa.suse.de
    VIRSH_GUEST: openqaw5-xen.qa.suse.de
    VIRSH_PASSWORD: nots3cr3t
    VIRSH_INSTANCE: 1

We can see no mac address is defined here.

Actions #4

Updated by rfan1 over 1 year ago

Thanks to @szarate for the help, I can find some clue

[ 17.180395][ C0] IPv6: eth0: IPv6 duplicate address 2620:113:80c0:80a0:10:xx:xx:xx used by 00:16:3e:xx:xx:xx detected!

I can see mac address are generated randomly.

sub genmac {
    my @mac = split(/:/, shift);
    my $len = scalar(@mac);
    for (my $i = 0; $i < (6 - $len); $i++) {
        push @mac, (sprintf("%02X", int(rand(254))));
    }
    return lc(join(':', @mac));
}


    elsif ($vmm_family eq 'xen') {
        $ifacecfg{type} = 'bridge';
        $ifacecfg{source} = {bridge => 'br0'};
        $ifacecfg{virtualport} = {type => 'openvswitch'};
        $ifacecfg{mac} = {address => genmac('00:16:3e')};
        $iface_model = 'netfront';
    }

Then I don't think there is some mac conflict, I will try to disable ipv6 on the published qcow2 and see.

Actions #6

Updated by rfan1 over 1 year ago

More findings:

https://openqa.suse.de/tests/9527548#step/boot_to_desktop/4
https://openqa.suse.de/tests/9555476#step/boot_to_desktop/4
https://openqa.suse.de/tests/9545686#step/boot_to_desktop/4

Different jobs are trying to get the same dhcp ip addresses even they use different mac address. [but they use the same qcow2 image]

Next plan:
1.Investigate more on qcow2 image to see if any hardcode network settings
2.Ask for Tool team's help to debug dhcp server

Actions #7

Updated by szarate over 1 year ago

rfan1 wrote:

More findings:

https://openqa.suse.de/tests/9527548#step/boot_to_desktop/4
https://openqa.suse.de/tests/9555476#step/boot_to_desktop/4
https://openqa.suse.de/tests/9545686#step/boot_to_desktop/4

Different jobs are trying to get the same dhcp ip addresses even they use different mac address. [but they use the same qcow2 image]

Next plan:
1.Investigate more on qcow2 image to see if any hardcode network settings

This is the problem: https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/7e418a3e23dda82f732b9a2253dc0c9d4a076e39/tests/shutdown/cleanup_before_shutdown.pm#L42

  • Dominik says that the net rules might not be the problem, one idea is to use the VIRSH_INSTANCE to also generate the mac address so that we have enough entropy for the ipv6 address
Actions #8

Updated by szarate over 1 year ago

  • Sprint set to QE-Core: September Sprint (Aug 31 - Sep 28)
  • Tags set to bugbusters
  • Status changed from New to Workable
  • Target version set to QE-Core: Ready
Actions #9

Updated by rfan1 over 1 year ago

I can get the same ip address with the publishing qcow2 image job:
https://openqa.suse.de/tests/9527547#step/first_boot/1

I tried to unpack the initrd file from published qcow2 file, and I can see hardcode mac address there:

# lsinitrd --unpack initrd-5.14.21-150400.24.18-default 
# cat ./etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /usr/lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.

# net device ()
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:16:3e:68:xx:xx", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
Actions #11

Updated by szarate over 1 year ago

  • Blocks action #116773: [qe-core][functional][sle15sp5]test fails in redis added
Actions #12

Updated by szarate over 1 year ago

  • Status changed from Workable to Blocked
Actions #13

Updated by szarate over 1 year ago

  • Sprint changed from QE-Core: September Sprint (Aug 31 - Sep 28) to QE-Core: October Sprint (Sep 28 - Oct 26)
Actions #14

Updated by szarate over 1 year ago

  • Blocks deleted (action #116773: [qe-core][functional][sle15sp5]test fails in redis)
Actions #15

Updated by szarate over 1 year ago

  • Status changed from Blocked to Workable

Richard provided already with a workaround, https://suse.slack.com/archives/C02CSAZLAR4/p1664531900908269?thread_ts=1664529692.179449&cid=C02CSAZLAR4

 # workaround for network issue
    enter_cmd("cd /var/lib/wicked");
    my @files = qw(duid.xml lease-eth0-dhcp-ipv4.xml lease-eth0-dhcp-ipv6.xml); 
    enter_cmd("for i in \@files; do echo '' > /var/lib/wicked/$i; done");
    enter_cmd("cat /var/lib/wicked/\@files");
    assert_script_run('systemctl restart network');
    assert_script_run('systemctl status network');
Actions #16

Updated by VANASTASIADIS over 1 year ago

  • Assignee set to VANASTASIADIS
Actions #17

Updated by zluo over 1 year ago

@VANASTASIADIS just fyi: it works here with the workaround: https://openqa.suse.de/tests/9709748#step/rsync/43

Actions #18

Updated by rfan1 over 1 year ago

Please correct me if I am wrong, Pavel is working on this:

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15685

Actions #19

Updated by pdostal over 1 year ago

yes

Actions #20

Updated by VANASTASIADIS over 1 year ago

  • Status changed from Workable to Resolved
  • Assignee changed from VANASTASIADIS to pdostal

Resolving (pr is merged) and changing assignee to Pavel, since he's the one who tackled it

Actions

Also available in: Atom PDF