action #62198

failed services os-autoinst-openvswitch.service on powerqaworker-qam-1 and malbec

Added by okurz about 1 month ago. Updated about 1 month ago.

Status:ResolvedStart date:16/01/2020
Priority:UrgentDue date:
Assignee:okurz% Done:

0%

Category:-
Target version:openQA Project - Done
Duration:

Description

Observation

$ systemctl status os-autoinst-openvswitch.service
● os-autoinst-openvswitch.service - os-autoinst openvswitch helper
   Loaded: loaded (/usr/lib/systemd/system/os-autoinst-openvswitch.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/os-autoinst-openvswitch.service.d
           └─override.conf
   Active: failed (Result: exit-code) since Tue 2020-01-14 22:01:12 CET; 1 day 17h ago
 Main PID: 97843 (code=exited, status=2)

Jan 14 22:00:42 powerqaworker-qam-1 systemd[1]: Starting os-autoinst openvswitch helper...
Jan 14 22:01:12 powerqaworker-qam-1 sh[22569]: br1             no-device
Jan 14 22:01:12 powerqaworker-qam-1 systemd[1]: os-autoinst-openvswitch.service: Control process exited, code=exited status=157
Jan 14 22:01:12 powerqaworker-qam-1 systemd[1]: Failed to start os-autoinst openvswitch helper.
Jan 14 22:01:12 powerqaworker-qam-1 systemd[1]: os-autoinst-openvswitch.service: Unit entered failed state.
Jan 14 22:01:12 powerqaworker-qam-1 systemd[1]: os-autoinst-openvswitch.service: Failed with result 'exit-code'.

Related issues

Copied to openQA Infrastructure - action #62228: powerqaworker-qam-1 is down Resolved 16/01/2020

History

#1 Updated by okurz about 1 month ago

Restart on malbec helped. Trying with powerqaworker-qam-1 as well. Have disabled alerts on https://stats.openqa-monitor.qa.suse.de/alerting/list?state=all , should re-enable after checking after reboot again.

EDIT: system did not come up yet, reboot triggered over https://fsp1-powerqaworker-qam.qa.suse.de , no effect

EDIT: I don't understand the FSP that much. Using Power On/Off System I did another try to explicitly power the system off, wait and on again.

Still not up, nsinger also has no luck to get the machine back over the HMC. Can't connect to that machine https://powerhmc1.arch.suse.de/dashboard/#resources/systems/aeaf3ef1-7d73-305c-9d9a-41d45801cf47/logical-partitions. To quote the HMC: "Das System ist Versionsabweichung".

so I removed the machine from salt for now to be able to apply the salt state again, see https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/159645 for a failure.

sudo salt-key -y -d powerqaworker-qam-1

#2 Updated by okurz about 1 month ago

#3 Updated by okurz about 1 month ago

  • Status changed from In Progress to Resolved
  • Target version changed from Current Sprint to Done

Cloned the "machine is down" issue into #62228 and we can close this one as the service on malbec is fine and as long as powerqaworker-qam-1 is down the service also "does not fail" :)

Also available in: Atom PDF