Project

General

Profile

Actions

action #69694

closed

openqa-worker systemd services running in osd which should not be enabled at all and have no tap-device configured auto_review:"backend died:.*tap.*is not connected to bridge.*br1":retry

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2020-08-07
Due date:
2020-09-01
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/4535923 is incomplete with proper reason "backend died: 'tap14' is not connected to bridge 'br1' at /usr/lib/os-autoinst/backend/qemu.pm line 149." as introduced in #66376 but the worker instance openqaworker3:15 should not be running at all

Workaround

Restart the job until it ends up on a properly configured worker instance


Related issues 2 (0 open2 closed)

Related to openQA Project - action #66376: MM tests fail in obscure way when tap device is not presentResolvedokurz2020-05-04

Actions
Related to openQA Project - coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasonsResolvedokurz2020-04-012020-09-30

Actions
Actions #1

Updated by okurz over 3 years ago

  • Project changed from openQA Project to openQA Infrastructure
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version set to Ready
Actions #2

Updated by okurz over 3 years ago

  • Related to action #66376: MM tests fail in obscure way when tap device is not present added
Actions #3

Updated by okurz over 3 years ago

Following up to #65118#note-13 and doing sudo salt -l error --state-output=changes -C 'G@roles:worker' cmd.run 'curl -s https://w3.suse.de/~okurz/check_num_openqa_workers | sh -' returns that openqaworker3 is again superfluous worker instances.

> for i in {1..50}; do echo -e "$i:\t$(sudo systemctl is-enabled openqa-worker@$i)"; done
1:  enabled
2:  enabled
3:  enabled
4:  enabled
5:  enabled
6:  enabled
7:  enabled
8:  enabled
9:  enabled
10: enabled
11: enabled
12: enabled
13: enabled
14: disabled
15: enabled
16: enabled

I did sudo systemctl disable --now openqa-worker@{15,16} and now:

> for i in {1..50}; do echo -e "$i:\t$(sudo systemctl is-enabled openqa-worker@$i)"; done
1:  enabled
2:  enabled
3:  enabled
4:  enabled
5:  enabled
6:  enabled
7:  enabled
8:  enabled
9:  enabled
10: enabled
11: enabled
12: enabled
13: enabled
14: disabled
15: enabled-runtime
16: enabled-runtime

triggered reboot.

Actions #4

Updated by okurz over 3 years ago

  • Related to coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasons added
Actions #5

Updated by okurz over 3 years ago

  • Due date set to 2020-09-01
  • Status changed from In Progress to Feedback

Same checks do not report workers with the same problem. I can check again after my vacation.

Actions #6

Updated by okurz over 3 years ago

  • Status changed from Feedback to Resolved

same check as in #69694#note-3 reported no problem so this is considered fixed.

Actions

Also available in: Atom PDF