Project

General

Profile

Actions

action #116782

closed

o3 s390 workers are offline

Added by kraih over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2022-09-19
Due date:
2022-10-04
% Done:

0%

Estimated time:

Description

Observation

The O3 workers openqaworker1_container:101/102/103/104 went offline 2 days ago (graceful disconnect). The journal looks like this since then:

Sep 19 14:36:27 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Scheduled restart job, restart counter is at 40055.
Sep 19 14:36:27 openqaworker1 systemd[1]: Stopped Podman container-openqaworker1_container_101.service.
Sep 19 14:36:27 openqaworker1 systemd[1]: Starting Podman container-openqaworker1_container_101.service...
Sep 19 14:36:29 openqaworker1 podman[3032]: time="2022-09-19T14:36:29+02:00" level=warning msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\" doesn't exist, skipping"
Sep 19 14:36:29 openqaworker1 podman[3032]: time="2022-09-19T14:36:29+02:00" level=warning msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from \"/etc/containers/mounts.conf\" doesn't exist, skipping"
Sep 19 14:36:30 openqaworker1 podman[3032]: 2022-09-19 14:36:30.048687627 +0200 CEST m=+2.267947307 container init 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opencontainers.image>
Sep 19 14:36:30 openqaworker1 podman[3032]: 2022-09-19 14:36:30.298700925 +0200 CEST m=+2.517960607 container start 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.versi>
Sep 19 14:36:30 openqaworker1 podman[3032]: openqaworker1_container_101
Sep 19 14:36:30 openqaworker1 podman[3665]: 2022-09-19 14:36:30.324363704 +0200 CEST m=+0.053321696 container died 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101)
Sep 19 14:36:30 openqaworker1 systemd[1]: Started Podman container-openqaworker1_container_101.service.
Sep 19 14:36:31 openqaworker1 podman[3665]: 2022-09-19 14:36:31.278853821 +0200 CEST m=+1.007811860 container cleanup 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.cre>
Sep 19 14:36:31 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 14:36:31 openqaworker1 podman[3997]: 2022-09-19 14:36:31.638557925 +0200 CEST m=+0.261362636 container cleanup 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.ver>
Sep 19 14:36:31 openqaworker1 podman[3997]: openqaworker1_container_101
Sep 19 14:36:31 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Failed with result 'exit-code'.

Acceptance criteria

  • AC1: The workers are online again

Suggestions


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #97751: replacement setup for o3 s390x openQA workers size:MResolveddheidler2021-09-17

Actions
Related to openQA Project - action #119713: Leap tests are failing because of failed log file uploading in multiple tests on s390x size:MResolvedokurz2022-11-01

Actions
Actions

Also available in: Atom PDF