Project

General

Profile

action #116782

Updated by okurz over 1 year ago

## Observation 

 The O3 workers `openqaworker1_container:101/102/103/104` went offline 2 days ago (graceful disconnect). The journal looks like this since then: 

 ``` 
 Sep 19 14:36:27 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Scheduled restart job, restart counter is at 40055. 
 Sep 19 14:36:27 openqaworker1 systemd[1]: Stopped Podman container-openqaworker1_container_101.service. 
 Sep 19 14:36:27 openqaworker1 systemd[1]: Starting Podman container-openqaworker1_container_101.service... 
 Sep 19 14:36:29 openqaworker1 podman[3032]: time="2022-09-19T14:36:29+02:00" level=warning msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\" doesn't exist, skipping" 
 Sep 19 14:36:29 openqaworker1 podman[3032]: time="2022-09-19T14:36:29+02:00" level=warning msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from \"/etc/containers/mounts.conf\" doesn't exist, skipping" 
 Sep 19 14:36:30 openqaworker1 podman[3032]: 2022-09-19 14:36:30.048687627 +0200 CEST m=+2.267947307 container init 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opencontainers.image> 
 Sep 19 14:36:30 openqaworker1 podman[3032]: 2022-09-19 14:36:30.298700925 +0200 CEST m=+2.517960607 container start 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.versi> 
 Sep 19 14:36:30 openqaworker1 podman[3032]: openqaworker1_container_101 
 Sep 19 14:36:30 openqaworker1 podman[3665]: 2022-09-19 14:36:30.324363704 +0200 CEST m=+0.053321696 container died 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101) 
 Sep 19 14:36:30 openqaworker1 systemd[1]: Started Podman container-openqaworker1_container_101.service. 
 Sep 19 14:36:31 openqaworker1 podman[3665]: 2022-09-19 14:36:31.278853821 +0200 CEST m=+1.007811860 container cleanup 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.cre> 
 Sep 19 14:36:31 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Main process exited, code=exited, status=1/FAILURE 
 Sep 19 14:36:31 openqaworker1 podman[3997]: 2022-09-19 14:36:31.638557925 +0200 CEST m=+0.261362636 container cleanup 955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812 (image=registry.opensuse.org/devel/openqa/containers15.2/openqa_worker:latest, name=openqaworker1_container_101, org.opensuse.base.ver> 
 Sep 19 14:36:31 openqaworker1 podman[3997]: openqaworker1_container_101 
 Sep 19 14:36:31 openqaworker1 systemd[1]: container-openqaworker1_container_101.service: Failed with result 'exit-code'. 
 ``` 

 ## Acceptance criteria 
 * **AC1:** The workers are online again 

 ## Suggestions 
 * ~~Speak Speak with dheidler, who set these workers up~~ Read history from setup ticket #97751 and read instructions from https://progress.opensuse.org/projects/openqav3/wiki/Wiki#o3-s390-workers 
 * Run a 15.4 or Tumbleweed version 
 * Update our upgrade instructions for workers accordingly to include that step for the next time up

Back