Project

General

Profile

action #63853

[tools] broken /etc/sysconfig/network/ifcfg-br1

Added by dzedro about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
2020-02-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP1-Server-DVD-Updates-x86_64-qam_wicked_advanced_ref@64bit fails in
before_test

Test suite description

I fixed openqaworker7, the thing is that it's not first time I had to fix /etc/sysconfig/network/ifcfg-br1
with less tap devices (OVS_BRIDGE_PORT_DEVICE) defined than default
I don't think that somebody changed it intentionally, so question is how did it happen ?

openqaworker7:~ # ps aux|grep 'script/worker'|sort -k14
root     10978  0.0  0.0   7432   940 pts/0    S+   02:36   0:00 grep --color=auto script/worker
_openqa+  2685  0.1  0.1 367584 290516 ?       Ss   Feb25   1:20 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
_openqa+  2618  0.1  0.1 438096 360824 ?       Ss   Feb25   1:33 /usr/bin/perl /usr/share/openqa/script/worker --instance 10
_openqa+ 10743  1.7  0.1 438096 355896 ?       S    02:35   0:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 10
_openqa+  2803  0.1  0.1 437160 359636 ?       Ss   Feb25   1:30 /usr/bin/perl /usr/share/openqa/script/worker --instance 11
_openqa+  2575  0.1  0.1 350580 272724 ?       Ss   Feb25   1:24 /usr/bin/perl /usr/share/openqa/script/worker --instance 12
_openqa+  2715  0.1  0.1 374076 295288 ?       Ss   Feb25   1:16 /usr/bin/perl /usr/share/openqa/script/worker --instance 13
_openqa+  2881  0.0  0.0 283828 206080 ?       Ss   Feb25   0:57 /usr/bin/perl /usr/share/openqa/script/worker --instance 14
_openqa+  2596  0.1  0.1 379412 302764 ?       Ss   Feb25   1:21 /usr/bin/perl /usr/share/openqa/script/worker --instance 15
_openqa+ 10975  6.6  0.1 379412 297652 ?       S    02:36   0:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 15
_openqa+  2896  0.1  0.1 433612 357056 ?       Ss   Feb25   1:18 /usr/bin/perl /usr/share/openqa/script/worker --instance 16
_openqa+  2668  0.1  0.2 719260 641760 ?       Ss   Feb25   2:11 /usr/bin/perl /usr/share/openqa/script/worker --instance 17
_openqa+  2654  0.0  0.0 281684 204220 ?       Ss   Feb25   0:59 /usr/bin/perl /usr/share/openqa/script/worker --instance 18
_openqa+  2752  0.1  0.1 420904 344356 ?       Ss   Feb25   1:24 /usr/bin/perl /usr/share/openqa/script/worker --instance 19
_openqa+  4686  5.0  0.1 422652 341640 ?       R    02:23   0:39 /usr/bin/perl /usr/share/openqa/script/worker --instance 19
_openqa+  2843  0.1  0.0 314876 237220 ?       Ss   Feb25   1:32 /usr/bin/perl /usr/share/openqa/script/worker --instance 2
_openqa+  2822  0.1  0.1 361568 283616 ?       Ss   Feb25   1:24 /usr/bin/perl /usr/share/openqa/script/worker --instance 20
_openqa+  2700  0.1  0.0 335336 257848 ?       Ss   Feb25   1:17 /usr/bin/perl /usr/share/openqa/script/worker --instance 3
_openqa+  2731  0.2  0.3 1012536 935108 ?      Ss   Feb25   2:32 /usr/bin/perl /usr/share/openqa/script/worker --instance 4
_openqa+  2637  0.1  0.1 406248 329272 ?       Ss   Feb25   1:13 /usr/bin/perl /usr/share/openqa/script/worker --instance 5
_openqa+ 10747  1.7  0.1 406248 324376 ?       S    02:35   0:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 5
_openqa+  2556  0.0  0.0 242432 165292 ?       Ss   Feb25   0:53 /usr/bin/perl /usr/share/openqa/script/worker --instance 6
_openqa+  2863  0.1  0.1 348188 271252 ?       Ss   Feb25   1:07 /usr/bin/perl /usr/share/openqa/script/worker --instance 7
_openqa+ 10926  2.7  0.1 348188 266284 ?       S    02:36   0:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 7
_openqa+  2767  0.1  0.1 349836 273284 ?       Ss   Feb25   1:19 /usr/bin/perl /usr/share/openqa/script/worker --instance 8
_openqa+  9821  1.5  0.1 349836 268444 ?       S    02:34   0:01 /usr/bin/perl /usr/share/openqa/script/worker --instance 8
_openqa+  2782  0.1  0.0 260460 183656 ?       Ss   Feb25   1:12 /usr/bin/perl /usr/share/openqa/script/worker --instance 9
openqaworker7:~ #

https://openqa.suse.de/tests/3923836/file/autoinst-log.txt

[2020-02-26T02:36:28.129 CET] [debug] Failed to run dbus command 'unset_vlan' with arguments 'tap17 7' : 'tap17' is not connected to bridge 'br1'
[2020-02-26T02:36:28.139 CET] [debug] Failed to run dbus command 'unset_vlan' with arguments 'tap81 7' : 'tap81' is not connected to bridge 'br1'

openqaworker7:~ # cat /etc/sysconfig/network/ifcfg-br1
BOOTPROTO='static'
IPADDR='10.0.2.2/15'
STARTMODE='auto'
OVS_BRIDGE='yes'
OVS_BRIDGE_PORT_DEVICE_0='tap0'
OVS_BRIDGE_PORT_DEVICE_64='tap64'
OVS_BRIDGE_PORT_DEVICE_128='tap128'
OVS_BRIDGE_PORT_DEVICE_1='tap1'
OVS_BRIDGE_PORT_DEVICE_65='tap65'
OVS_BRIDGE_PORT_DEVICE_129='tap129'
OVS_BRIDGE_PORT_DEVICE_2='tap2'
OVS_BRIDGE_PORT_DEVICE_66='tap66'
OVS_BRIDGE_PORT_DEVICE_130='tap130'
OVS_BRIDGE_PORT_DEVICE_3='tap3'
OVS_BRIDGE_PORT_DEVICE_67='tap67'
OVS_BRIDGE_PORT_DEVICE_131='tap131'
OVS_BRIDGE_PORT_DEVICE_4='tap4'
OVS_BRIDGE_PORT_DEVICE_68='tap68'
OVS_BRIDGE_PORT_DEVICE_132='tap132'
OVS_BRIDGE_PORT_DEVICE_5='tap5'
OVS_BRIDGE_PORT_DEVICE_69='tap69'
OVS_BRIDGE_PORT_DEVICE_133='tap133'
OVS_BRIDGE_PORT_DEVICE_6='tap6'
OVS_BRIDGE_PORT_DEVICE_70='tap70'
OVS_BRIDGE_PORT_DEVICE_134='tap134'
OVS_BRIDGE_PORT_DEVICE_7='tap7'
OVS_BRIDGE_PORT_DEVICE_71='tap71'
OVS_BRIDGE_PORT_DEVICE_135='tap135'
OVS_BRIDGE_PORT_DEVICE_8='tap8'
OVS_BRIDGE_PORT_DEVICE_72='tap72'
OVS_BRIDGE_PORT_DEVICE_136='tap136'
OVS_BRIDGE_PORT_DEVICE_9='tap9'
OVS_BRIDGE_PORT_DEVICE_73='tap73'
OVS_BRIDGE_PORT_DEVICE_137='tap137'
PRE_UP_SCRIPT="wicked:gre_tunnel_preup.sh"

openqaworker7:~ # ovs-ofctl show br1
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000baf98cde6e43
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(gre1): addr:16:73:42:32:c6:2d
     config:     0
     state:      STP_FORWARD
     speed: 0 Mbps now, 0 Mbps max
 2(gre2): addr:36:33:81:28:86:1c
     config:     0
     state:      STP_FORWARD
     speed: 0 Mbps now, 0 Mbps max
 3(gre3): addr:32:61:fa:75:1e:c6
     config:     0
     state:      STP_BLOCK
     speed: 0 Mbps now, 0 Mbps max
 4(gre4): addr:16:8b:1e:c7:4b:25
     config:     0
     state:      STP_FORWARD
     speed: 0 Mbps now, 0 Mbps max
 5(gre5): addr:e6:9d:46:a2:18:a6
     config:     0
     state:      STP_BLOCK
     speed: 0 Mbps now, 0 Mbps max
 6(gre6): addr:76:e1:e4:1b:b1:70
     config:     0
     state:      STP_FORWARD
     speed: 0 Mbps now, 0 Mbps max
 7(gre7): addr:26:47:35:2d:4c:3c
     config:     0
     state:      STP_FORWARD
     speed: 0 Mbps now, 0 Mbps max
 8(gre8): addr:3a:e3:97:6c:02:83
     config:     0
     state:      STP_BLOCK
     speed: 0 Mbps now, 0 Mbps max
 9(gre9): addr:62:b2:24:f9:16:7f
     config:     0
     state:      STP_BLOCK
     speed: 0 Mbps now, 0 Mbps max
 10(gre10): addr:7a:86:8d:7e:58:ea
     config:     0
     state:      STP_FORWARD
     speed: 0 Mbps now, 0 Mbps max
 11(gre11): addr:c2:48:06:09:0e:11
     config:     0
     state:      STP_BLOCK
     speed: 0 Mbps now, 0 Mbps max
 12(tap0): addr:f2:02:1e:ec:2e:46
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 13(tap1): addr:a6:db:bb:f6:3c:c1
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 14(tap128): addr:4a:b7:e6:93:53:0c
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 15(tap129): addr:2e:4f:fb:97:e2:cd
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 16(tap130): addr:92:27:36:bb:dd:f2
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 17(tap131): addr:d2:b3:4f:03:85:f9
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 18(tap132): addr:9a:f2:06:72:3c:fb
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 19(tap133): addr:26:d3:35:63:45:2e
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 20(tap134): addr:c6:11:93:de:ac:07
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 21(tap135): addr:ba:8c:b2:35:d6:47
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 22(tap136): addr:2a:87:c4:f1:1d:5a
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 23(tap137): addr:62:d6:c6:0c:5f:f9
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 24(tap2): addr:4e:52:09:f5:1b:4b
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 25(tap3): addr:b6:83:e7:51:6f:ec
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 26(tap4): addr:f2:cf:c3:b0:a8:e3
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 27(tap5): addr:76:81:39:c6:53:73
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 28(tap6): addr:c6:29:eb:59:be:c6
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 29(tap64): addr:4a:19:3a:eb:1b:44
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 30(tap65): addr:b2:a0:a7:a4:32:86
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 31(tap66): addr:6e:44:37:bf:13:0b
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 32(tap67): addr:26:f6:1c:ff:f2:d9
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 33(tap68): addr:0a:83:78:77:22:2f
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 34(tap69): addr:e2:a9:81:77:c7:25
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 35(tap7): addr:92:26:c6:ba:2f:1f
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 36(tap70): addr:8a:ed:17:61:f7:7a
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 37(tap71): addr:9e:a5:88:eb:fc:6b
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 38(tap72): addr:26:1a:a5:f5:f3:c0
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 39(tap73): addr:12:8a:76:e1:ee:4c
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 40(tap8): addr:72:e5:e1:d0:cc:6b
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 41(tap9): addr:fe:bf:22:e7:8c:a9
     config:     0
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
 LOCAL(br1): addr:ba:f9:8c:de:6e:43
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
openqaworker7:~ #

Reproducible

Fails since (at least) Build 20200225-2 (current job)

Expected result

Last good: 20200225-1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Infrastructure - action #60962: Enable multi-machine capability for all configured workersNew2019-12-12

Related to openQA Project - action #64129: Set `$0` for upload process to something more explicit (was: Duplicate worker instances competing)New2020-03-03

Copied to openQA Infrastructure - action #63874: ensure openqa worker instances are disabled and stopped when "numofworkers" is reduced in salt pillars, e.g. causing non-obvious multi-machine failuresResolved2020-02-26

History

#1 Updated by okurz about 1 year ago

keep in mind that the file is managed in salt: https://gitlab.suse.de/openqa/salt-states-openqa/blob/master/openqa/openvswitch.sls#L53 . As you never commited to that file I assume you did local changes which for sure will be lost when salt overwrites again.

#2 Updated by dzedro about 1 year ago

Then salt is broken, /etc/sysconfig/network/ifcfg-br1 was NOT OK and salt was fine with it, maybe salt broke it.

#3 Updated by dzedro about 1 year ago

Looks like salt is breaking it.
It is ok until next reboot, as the tap devices are now created.

# cat /etc/sysconfig/network/ifcfg-br1
BOOTPROTO='static'
IPADDR='10.0.2.2/15'
STARTMODE='auto'
OVS_BRIDGE='yes'
OVS_BRIDGE_PORT_DEVICE_0='tap0'
OVS_BRIDGE_PORT_DEVICE_64='tap64'
OVS_BRIDGE_PORT_DEVICE_128='tap128'
OVS_BRIDGE_PORT_DEVICE_1='tap1'
OVS_BRIDGE_PORT_DEVICE_65='tap65'
OVS_BRIDGE_PORT_DEVICE_129='tap129'
OVS_BRIDGE_PORT_DEVICE_2='tap2'
OVS_BRIDGE_PORT_DEVICE_66='tap66'
OVS_BRIDGE_PORT_DEVICE_130='tap130'
OVS_BRIDGE_PORT_DEVICE_3='tap3'
OVS_BRIDGE_PORT_DEVICE_67='tap67'
OVS_BRIDGE_PORT_DEVICE_131='tap131'
OVS_BRIDGE_PORT_DEVICE_4='tap4'
OVS_BRIDGE_PORT_DEVICE_68='tap68'
OVS_BRIDGE_PORT_DEVICE_132='tap132'
OVS_BRIDGE_PORT_DEVICE_5='tap5'
OVS_BRIDGE_PORT_DEVICE_69='tap69'
OVS_BRIDGE_PORT_DEVICE_133='tap133'
OVS_BRIDGE_PORT_DEVICE_6='tap6'
OVS_BRIDGE_PORT_DEVICE_70='tap70'
OVS_BRIDGE_PORT_DEVICE_134='tap134'
OVS_BRIDGE_PORT_DEVICE_7='tap7'
OVS_BRIDGE_PORT_DEVICE_71='tap71'
OVS_BRIDGE_PORT_DEVICE_135='tap135'
OVS_BRIDGE_PORT_DEVICE_8='tap8'
OVS_BRIDGE_PORT_DEVICE_72='tap72'
OVS_BRIDGE_PORT_DEVICE_136='tap136'
OVS_BRIDGE_PORT_DEVICE_9='tap9'
OVS_BRIDGE_PORT_DEVICE_73='tap73'
OVS_BRIDGE_PORT_DEVICE_137='tap137'
PRE_UP_SCRIPT="wicked:gre_tunnel_preup.sh"
openqaworker7:~ #

#4 Updated by okurz about 1 year ago

  • Status changed from New to In Progress
  • Assignee set to okurz

Problem I see is that tap devices for 10 worker instances are created but 20 openQA worker instances are running. What I did now is to disable with systemctl disable --now openqa-worker@{11..20}.

#5 Updated by okurz about 1 year ago

  • Related to action #60962: Enable multi-machine capability for all configured workers added

#6 Updated by okurz about 1 year ago

  • Copied to action #63874: ensure openqa worker instances are disabled and stopped when "numofworkers" is reduced in salt pillars, e.g. causing non-obvious multi-machine failures added

#7 Updated by okurz about 1 year ago

  • Status changed from In Progress to Resolved

I am pretty confident that #63853#note-4 solves this problem.
I also crosschecked other workers for equal "numofworkers" and currently active worker instance systemd services:

salt -l error --no-color -C  'G@roles:worker' --state-output=changes cmd.run "grep '# numofworkers' /etc/openqa/workers.ini; sudo systemctl is-active openqa-worker@\* | wc -l"
QA-Power8-4-kvm.qa.suse.de:
    # numofworkers: 8
    8
malbec.arch.suse.de:
    # numofworkers: 4
    4
openqaworker5.suse.de:
    # numofworkers: 22
    22
openqaworker9.suse.de:
    # numofworkers: 24
    24
QA-Power8-5-kvm.qa.suse.de:
    # numofworkers: 8
    8
openqaworker7.suse.de:
    # numofworkers: 10
    10
powerqaworker-qam-1:
    # numofworkers: 8
    8
grenache-1.qa.suse.de:
    # numofworkers: 28
    28
openqaworker6.suse.de:
    # numofworkers: 20
    20
openqaworker2.suse.de:
    # numofworkers: 28
    28
openqaworker10.suse.de:
    # numofworkers: 10
    10
openqaworker3.suse.de:
    # numofworkers: 16
    16
openqaworker13.suse.de:
    # numofworkers: 16
    16
openqaworker-arm-1.suse.de:
    # numofworkers: 4
    4
openqaworker-arm-3.suse.de:
    # numofworkers: 4
    4
openqaworker-arm-2.suse.de:
    # numofworkers: 30
    30

The problem that reducing worker instance numbers doesn't properly disable/stop the actual worker services can be followed up in #63874 . Would be nice if we could automatically detect if the same issue reappears, e.g. detect a string (regex) "Failed to run dbus command 'unset_vlan' with arguments 'tap.*is not connected to bridge" from autoinst-log.txt which is the story of #45011 which is theoretically possible with gitlab.suse.de/openqa/auto-review/

#8 Updated by dzedro about 1 year ago

I don't know what worker count has to do with error message & wrong /etc/sysconfig/network/ifcfg-br1, but whatever ...

Failed to run dbus command 'unset_vlan' with arguments 'tap17 7' : 'tap17' is not connected to bridge 'br1'

#9 Updated by okurz about 1 year ago

dzedro wrote:

I don't know what worker count has to do with error message & wrong /etc/sysconfig/network/ifcfg-br1, but whatever ...

The file /etc/sysconfig/network/ifcfg-br1 is managed by https://gitlab.suse.de/openqa/salt-states-openqa/blob/master/openqa/openvswitch.sls#L53 based on the "worker count" meaning that for every openQA worker instance three tap devices are configured. What happened is that systemd services for worker instances were started that are above the expected "worker count" so that these higher numbers of worker instances did not have corresponding tap devices.

Example:

  • 2 worker instances are planned to be supported
  • /etc/sysconfig/network/ifcfg-br1 has 2*3 TAP devices configured, e.g. 0+1, 64+65, 128+129
  • now – by mistake – someone starts openqa-worker@3 but there are no corresponding tap devices -> "Failed to run dbus command 'unset_vlan'"

Unless we find a better way to dynamically configure tap devices per worker instance we will have this caveat

#10 Updated by dzedro about 1 year ago

  • Status changed from Resolved to In Progress

The workers are still broken. I just noticed failed multimachine job due to network,
checked the openqaworker9 and there were duplicate worker instances.
https://openqa.suse.de/tests/3947509#step/kernel_multipath/78

Is not the same issue as before with 'tap17' is not connected to bridge 'br1',
but obviously there is something broken with multimachine and workers.
For some reason openQA is starting duplicate workers.

openqaworker9:~ # ps aux|grep script/worker|sort -k 14
root     28987  0.0  0.0   7432   900 pts/0    S+   14:09   0:00 grep --color=auto script/worker
_openqa+ 13085  0.0  0.3 1033476 955524 ?      Ss   Feb26   6:54 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
_openqa+ 11843  0.1  0.5 1657844 1580096 ?     Ss   Feb26   9:19 /usr/bin/perl /usr/share/openqa/script/worker --instance 10
_openqa+ 28834 12.6  0.5 1657844 1574928 ?     S    14:09   0:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 10
_openqa+ 11606  0.0  0.5 1420776 1343884 ?     Ss   Feb26   8:23 /usr/bin/perl /usr/share/openqa/script/worker --instance 11
_openqa+ 11599  0.0  0.4 1161148 1084192 ?     Ss   Feb26   7:44 /usr/bin/perl /usr/share/openqa/script/worker --instance 12
_openqa+ 11826  0.0  0.3 897248 819676 ?       Ss   Feb26   6:43 /usr/bin/perl /usr/share/openqa/script/worker --instance 13
_openqa+ 11778  0.0  0.4 1214116 1137132 ?     Ss   Feb26   7:42 /usr/bin/perl /usr/share/openqa/script/worker --instance 14
_openqa+ 11788  0.0  0.3 1129312 1051860 ?     Ss   Feb26   8:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 15
_openqa+ 11590  0.0  0.3 995012 917436 ?       Ss   Feb26   6:39 /usr/bin/perl /usr/share/openqa/script/worker --instance 16
_openqa+ 11743  0.1  0.4 1301780 1224648 ?     Ss   Feb26   9:16 /usr/bin/perl /usr/share/openqa/script/worker --instance 17
_openqa+ 11592  0.0  0.4 1348996 1271716 ?     Ss   Feb26   8:01 /usr/bin/perl /usr/share/openqa/script/worker --instance 18
_openqa+ 11879  0.0  0.3 918048 841612 ?       Ss   Feb26   6:31 /usr/bin/perl /usr/share/openqa/script/worker --instance 19
_openqa+ 24941  9.9  0.3 924308 843260 ?       S    14:05   0:23 /usr/bin/perl /usr/share/openqa/script/worker --instance 19
_openqa+ 11632  0.0  0.3 1010856 933932 ?      Ss   Feb26   7:11 /usr/bin/perl /usr/share/openqa/script/worker --instance 2
_openqa+ 11869  0.0  0.4 1325176 1247172 ?     Ss   Feb26   8:07 /usr/bin/perl /usr/share/openqa/script/worker --instance 20
_openqa+ 11806  0.0  0.4 1134360 1057500 ?     Ss   Feb26   8:13 /usr/bin/perl /usr/share/openqa/script/worker --instance 21
_openqa+ 11596  0.0  0.4 1269196 1192176 ?     Ss   Feb26   7:16 /usr/bin/perl /usr/share/openqa/script/worker --instance 22
_openqa+ 11721  0.0  0.3 966140 888208 ?       Ss   Feb26   6:41 /usr/bin/perl /usr/share/openqa/script/worker --instance 23
_openqa+ 11684  0.0  0.3 1089036 1012660 ?     Ss   Feb26   7:47 /usr/bin/perl /usr/share/openqa/script/worker --instance 24
_openqa+ 11851  0.0  0.5 1411180 1333456 ?     Ss   Feb26   9:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 3
_openqa+ 28532 15.0  0.5 1411180 1328408 ?     R    14:08   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 3
_openqa+ 11792  0.0  0.3 1074228 996668 ?      Ss   Feb26   7:18 /usr/bin/perl /usr/share/openqa/script/worker --instance 4
_openqa+ 11828  0.0  0.3 1052260 975172 ?      Ss   Feb26   7:18 /usr/bin/perl /usr/share/openqa/script/worker --instance 5
_openqa+ 11588  0.0  0.4 1266608 1190040 ?     Ss   Feb26   8:18 /usr/bin/perl /usr/share/openqa/script/worker --instance 6
_openqa+ 28967 22.6  0.4 1266608 1185056 ?     S    14:09   0:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 6
_openqa+ 11600  0.0  0.4 1332444 1255108 ?     Ss   Feb26   8:26 /usr/bin/perl /usr/share/openqa/script/worker --instance 7
_openqa+ 11858  0.1  0.4 1268384 1192088 ?     Ss   Feb26   9:36 /usr/bin/perl /usr/share/openqa/script/worker --instance 8
_openqa+ 11594  0.1  0.4 1341872 1265396 ?     Ss   Feb26   9:46 /usr/bin/perl /usr/share/openqa/script/worker --instance 9
openqaworker9:~ #

Duplicate worker ID for short while, then it will dissapear ...

openqaworker9:~ # ps aux|grep script/worker|sort -k 14
root       686  0.0  0.0   7432   860 pts/0    S+   14:50   0:00 grep --color=auto script/worker
_openqa+ 34654  0.1  0.0 144212 67664 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
_openqa+ 34671  0.2  0.0 147876 71360 ?        Ss   14:17   0:05 /usr/bin/perl /usr/share/openqa/script/worker --instance 10
_openqa+ 34673  0.1  0.0 140624 64372 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 11
_openqa+ 34675  0.1  0.0 143488 67192 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 12
_openqa+ 34677  0.1  0.0 140048 63508 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 13
_openqa+ 34679  0.1  0.0 140648 64240 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 14
_openqa+ 34681  0.1  0.0 150248 73464 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 15
_openqa+ 34683  0.1  0.0 140228 63792 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 16
_openqa+ 34685  0.1  0.0 140136 63720 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 17
_openqa+ 34687  0.2  0.0 139448 62932 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 18
_openqa+ 34689  0.1  0.0 141308 64792 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 19
_openqa+ 34655  0.1  0.0 140684 64008 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 2
_openqa+ 34691  0.1  0.0 151504 74892 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 20
_openqa+ 34693  0.2  0.0 150484 73760 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 21
_openqa+ 34695  0.1  0.0 141784 65352 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 22
_openqa+ 34697  0.1  0.0 139808 63136 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 23
_openqa+ 34699  0.1  0.0 140192 63952 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 24
_openqa+ 34658  0.1  0.0 141232 64520 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 3
_openqa+ 34659  0.1  0.0 147692 70908 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 4
_openqa+ 64175  5.3  0.0 147692 65972 ?        S    14:48   0:05 /usr/bin/perl /usr/share/openqa/script/worker --instance 4
_openqa+ 34661  0.1  0.0 144952 68336 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 5
_openqa+ 34663  0.1  0.0 145864 69012 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 6
_openqa+ 34665  0.2  0.0 155316 78516 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 7
_openqa+ 34667  0.1  0.0 140620 64156 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 8
_openqa+ 34669  0.1  0.0 141608 65320 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 9
openqaworker9:~ #

and sometimes the duplicate instance stays e.g. instance 3

openqaworker8:~ # ps aux|grep script/worker|sort -k 14
root     23870  0.0  0.0   7432   916 pts/0    S+   15:03   0:00 grep --color=auto script/worker
_openqa+ 48263  0.1  0.0 144068 67684 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
_openqa+ 48281  0.1  0.0 143820 67008 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 10
_openqa+ 48283  0.1  0.0 150288 73464 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 11
_openqa+ 48285  0.1  0.0 147820 70872 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 12
_openqa+ 48287  0.1  0.0 141776 65268 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 13
_openqa+ 48289  0.1  0.0 149852 73180 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 14
_openqa+ 48291  0.2  0.0 172740 95636 ?        Ss   14:17   0:05 /usr/bin/perl /usr/share/openqa/script/worker --instance 15
_openqa+ 48293  0.2  0.0 143972 67256 ?        Ss   14:17   0:06 /usr/bin/perl /usr/share/openqa/script/worker --instance 16
_openqa+ 48295  0.1  0.0 150480 73880 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 17
_openqa+ 48297  0.1  0.0 140040 63788 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 18
_openqa+ 48299  0.1  0.0 157544 80872 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 19
_openqa+ 48266  0.1  0.0 151932 75244 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 2
_openqa+ 48301  0.1  0.0 140608 63780 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 20
_openqa+ 48303  0.1  0.0 155180 78344 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 21
_openqa+ 48305  0.1  0.0 141968 65568 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 22
_openqa+ 48307  0.1  0.0 154248 77476 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 23
_openqa+ 48309  0.1  0.0 142600 65904 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 24
_openqa+ 22008  6.7  0.0 144624 62944 ?        S    15:01   0:07 /usr/bin/perl /usr/share/openqa/script/worker --instance 3
_openqa+ 48267  0.1  0.0 144624 67788 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 3
_openqa+ 48269  0.1  0.0 144224 67824 ?        Ss   14:17   0:02 /usr/bin/perl /usr/share/openqa/script/worker --instance 4
_openqa+ 48271  0.1  0.0 141572 65004 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 5
_openqa+ 48273  0.1  0.0 150872 74080 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 6
_openqa+ 48275  0.1  0.0 143452 66876 ?        Ss   14:17   0:04 /usr/bin/perl /usr/share/openqa/script/worker --instance 7
_openqa+ 23714  0.8  0.0 148080 66364 ?        S    15:03   0:00 /usr/bin/perl /usr/share/openqa/script/worker --instance 8
_openqa+ 48277  0.1  0.0 148080 71196 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 8
_openqa+ 48279  0.1  0.0 150288 73460 ?        Ss   14:17   0:03 /usr/bin/perl /usr/share/openqa/script/worker --instance 9

#11 Updated by okurz about 1 year ago

  • Subject changed from [tools] broken /etc/sysconfig/network/ifcfg-br1 to [tools] broken /etc/sysconfig/network/ifcfg-br1 / duplicate worker instances?

cool. The instances you mentioned seem to be gone but salt -l error --no-color -C 'G@roles:worker' --state-output=changes cmd.run "ps aux|grep script/worker|sort -k 14" from osd reveals more, e.g. on openqaworker8.

systemctl status openqa-worker@1 reveals that the duplicate "instance 1" is tracked within the same systemd service, at least:

# systemctl status openqa-worker@1
● openqa-worker@1.service - openQA Worker #1
   Loaded: loaded (/usr/lib/systemd/system/openqa-worker@.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/openqa-worker@.service.d
           └─override.conf
   Active: active (running) since Tue 2020-03-03 14:17:58 CET; 1h 22min ago
  Process: 48262 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
 Main PID: 48263 (worker)
    Tasks: 2
   CGroup: /openqa.slice/openqa-worker.slice/openqa-worker@1.service
           ├─48263 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
           └─55122 /usr/bin/perl /usr/share/openqa/script/worker --instance 1

Mar 03 15:40:45 openqaworker8 worker[48263]: [info] [pid:55122] SLES-12-SP4-x86_64-mru-install-minimal-with-addons-Build:13917:adcli-Server-DVD-Incidents-64bit.qcow2: Processing chunk 291>
…

Something has started the new process, probably some salt deployment but the old process seems to have been not properly stopped.

#12 Updated by okurz about 1 year ago

  • Status changed from In Progress to Resolved

I have separated the "duplicate worker instances" issue into #64129 after realizing that very likely it is not the same issue but also because we see the same problem on o3 it seems. If I would ask anyone else to help with this then probably they would be confused about the "ifcfg" story first. I hope that's ok with you as well.

#13 Updated by okurz about 1 year ago

  • Subject changed from [tools] broken /etc/sysconfig/network/ifcfg-br1 / duplicate worker instances? to [tools] broken /etc/sysconfig/network/ifcfg-br1

#14 Updated by okurz about 1 year ago

  • Related to action #64129: Set `$0` for upload process to something more explicit (was: Duplicate worker instances competing) added

#15 Updated by okurz about 1 year ago

dzedro I have updated #64129 . In short: There are no "duplicate worker instances", the problem regarding MM tests and network must be something different. Unfortunately the tests, e.g. https://openqa.suse.de/tests/3947509 are a bit confusing with the included error messages, e.g. Repository 'qa-head' not found by its alias, number, or URI.. If you found examples of tests without error messages in the "good" case and more network related problems in the "bad" case maybe we can help you more. Otherwise I am afraid we can not be of much help for multi-machine tests which no one from the QA tools team has much experience with.

#16 Updated by dzedro about 1 year ago

okurz sorry, my bad, didn't know worker is creating subprocess ...

#17 Updated by okurz about 1 year ago

dzedro wrote:

okurz sorry, my bad, didn't know worker is creating subprocess ...

no problem. It's good you challenged us ;) And also we found a point for improvement with #64129

Also available in: Atom PDF