Project

General

Profile

Actions

action #128786

closed

worker instances on rebel (o3 s390x worker) were not running, services disabled, except for rebel:5

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2023-05-05
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

While trying to verify https://github.com/os-autoinst/os-autoinst/pull/2311 I found that the last s390x jobs have been running on rebel:1 or rebel:4 22h ago, i.e. 2023-05-04, but openqa-worker-auto-restart@1 was not running and disabled.

Acceptance criteria

  • AC1: s390x worker instances are automatically started
Actions #1

Updated by okurz over 1 year ago

  • Due date set to 2023-05-19
  • Status changed from In Progress to Feedback

I found in the system journal that the automatic update service failed due to vendor change not being allowed so I changed that to be consistent on all machines, also missing on openqaworker19/20,qa-power8-3:

for i in aarch64 openqaworker4 openqaworker7 openqaworker19 openqaworker20 qa-power8-3 rebel; do echo $i && ssh root@$i "sed -i 's/\(solver.dupAllowVendorChange = \)false/\1true/' /etc/zypp/zypp.conf" ; done

Then zypper -n dup looked fine on rebel. I did

systemctl enable --now openqa-worker-auto-restart@{1..6}
2023-05-05 10:48:28 systemctl enable --now openqa-reload-worker-auto-restart@{1..6}.path

jobs started fine, e.g. https://openqa.opensuse.org/tests/3265981

Monitoring …

Actions #2

Updated by okurz over 1 year ago

  • Due date deleted (2023-05-19)
  • Status changed from Feedback to Resolved

https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20230508&groupid=34 is fine, the host rebel is still up since 4 days, no automatic reboot.

Actions

Also available in: Atom PDF