action #116782
Updated by okurz over 2 years ago
## Observation The O3 workers `openqaworker1_container:101/102/103/104` went offline 2 days ago (graceful disconnect). The journal looks like this since then: ``` Sep 19 14:36:27 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Scheduled restart job, restart counter is at 40055. Sep 19 14:36:27 openqaworker1 systemd[1]: Stopped Podman container-openqaworker1_container_101.service. Sep 19 14:36:27 openqaworker1 systemd[1]: Starting Podman container-openqaworker1_container_101.service... Sep 19 14:36:29 openqaworker1 podman[3032]: time="2022-09-19T14:36:29+02:00" level=warning msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\" doesn't exist, skipping" Sep 19 14:36:29 openqaworker1 podman[3032]: time="2022-09-19T14:36:29+02:00" level=warning msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from \"/etc/containers/mounts.conf\" doesn't exist, skipping" Sep 19 14:36:30 openqaworker1 podman[3032]: 2022-09-19 14:36:30.048687627 +0200 CEST m=+2.267947307 container init 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opencontainers.image> Sep 19 14:36:30 openqaworker1 podman[3032]: 2022-09-19 14:36:30.298700925 +0200 CEST m=+2.517960607 container start 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.versi> Sep 19 14:36:30 openqaworker1 podman[3032]: openqaworker1_container_101 Sep 19 14:36:30 openqaworker1 podman[3665]: 2022-09-19 14:36:30.324363704 +0200 CEST m=+0.053321696 container died 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101) Sep 19 14:36:30 openqaworker1 systemd[1]: Started Podman container-openqaworker1_container_101.service. Sep 19 14:36:31 openqaworker1 podman[3665]: 2022-09-19 14:36:31.278853821 +0200 CEST m=+1.007811860 container cleanup 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.cre> Sep 19 14:36:31 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Main process exited, code=exited, status=1/FAILURE Sep 19 14:36:31 openqaworker1 podman[3997]: 2022-09-19 14:36:31.638557925 +0200 CEST m=+0.261362636 container cleanup 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.ver> Sep 19 14:36:31 openqaworker1 podman[3997]: openqaworker1_container_101 Sep 19 14:36:31 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Failed with result 'exit-code'. ``` ## Acceptance criteria * **AC1:** The workers are online again ## Suggestions * ~~Speak Speak with dheidler, who set these workers up~~ Read history from setup ticket #97751 and read instructions from https://progress.opensuse.org/projects/openqav3/wiki/Wiki#o3-s390-workers * Run a 15.4 or Tumbleweed version * Update our upgrade instructions for workers accordingly to include that step for the next time up