Project

General

Profile

Actions

action #103554

closed

o3 s390x worker instances 102+103 down whereas 101+104 are up

Added by okurz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2021-12-06
Due date:
2021-12-20
% Done:

0%

Estimated time:

Description

Observation

As discussed in https://matrix.to/#/!ilXMcHXPOjTZeauZcg:libera.chat/$0ET_hLkIi4dP9a-MiTNhBOfvpht1_cQWLGasoZgobjI I found that openqaworker1_container:102 and openqaworker1_container:103 were offline whereas :101+:104 are up.

Acceptance criteria

  • AC1: All four s390x VMs linux144-linux147 can be used within o3
  • AC2: According openQA worker instances are persistently running also over nightly worker host reboots
Actions #1

Updated by okurz almost 3 years ago

  • Description updated (diff)

So nsinger wasn't sure what's the expected state. I guess openqaworker1_containter:101..104 should be online, right?
nsinger or someone else might have the services disabled on purpose but does not remember. So a simple systemctl start openqaworker1_container102 should be the next step. That did not work because the service is autogenerated by podman and has a btrfs container file hash that is not valid anymore
I now did for i in 102 103; do podman container rm openqaworker1_container_10$i; done and then followed https://progress.opensuse.org/projects/openqav3/wiki/#o3-s390-workers. Now I see both additional instances up in https://openqa.opensuse.org/admin/workers and ready to take jobs. Let's await automatic reboots.

Actions #2

Updated by okurz almost 3 years ago

  • Due date set to 2021-12-20
  • Status changed from New to Feedback
Actions #3

Updated by okurz almost 3 years ago

  • Status changed from Feedback to Resolved

openqaworker1 rebooted multiple times and was changed but all four worker instances are alive and happy (I assume, they did not complain, at least)

Actions

Also available in: Atom PDF