Project

General

Profile

action #115106

Updated by mkittler over 1 year ago

## Observation 
 On o3 there were three jobs cancelled with reason "cancelled based on job settings" and no result "obsoleted" (the result is now actually shown in the web UI). The according worker slots still had these jobs associated after 12 days so these worker slots were basically blocked/stuck. I suppose it is correct that these jobs ended up cancelled but the associated worker slots should not be blocked/stuck. 

 Jobs: https://openqa.opensuse.org/tests/2483081, https://openqa.opensuse.org/tests/2496053, https://openqa.opensuse.org/tests/2492150 

 **NOTE**: I've created #116078 for the concrete problems on power8. I'll keep that ticket around for determining whether we can still do anything better in openQA (see suggestions below). However, it seems unlikely considering the worker slots were actually stuck. So *maybe* we also want to decide closing this generic ticket and only focus on recovering power8 and debugging its strange behavior. 

 The worker log for one of the affected slots (on power8): 

 ``` 
 Jul 26 16:32:05 power8 systemd[1]: Started openQA Worker #5. 
 Jul 26 16:32:06 power8 worker[158130]: [info] worker 5: 
 Jul 26 16:32:06 power8 worker[158130]:    - config file:             /etc/openqa/workers.ini 
 Jul 26 16:32:06 power8 worker[158130]:    - worker hostname:         power8 
 Jul 26 16:32:06 power8 worker[158130]:    - isotovideo version:      26 
 Jul 26 16:32:06 power8 worker[158130]:    - websocket API version: 1 
 Jul 26 16:32:06 power8 worker[158130]:    - web UI hosts:            http://openqa1-opensuse 
 Jul 26 16:32:06 power8 worker[158130]:    - class:                   qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload 
 Jul 26 16:32:06 power8 worker[158130]:    - no cleanup:              no 
 Jul 26 16:32:06 power8 worker[158130]:    - pool directory:          /var/lib/openqa/pool/5 
 Jul 26 16:32:06 power8 worker[158130]: [info] CACHE: caching is enabled, setting up /var/lib/openqa/cache/openqa1-opensuse 
 Jul 26 16:32:06 power8 worker[158130]: [info] Project dir for host http://openqa1-opensuse is /var/lib/openqa/share 
 Jul 26 16:32:06 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Jul 26 16:32:06 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Jul 26 16:32:06 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Jul 26 16:32:25 power8 worker[158130]: [info] +++ setup notes +++ 
 Jul 26 16:32:25 power8 worker[158130]: [info] Running on power8:5 (Linux 5.14.21-150400.24.11-default #1 SMP Sun Jul 17 20:46:33 UTC 2022 (be260ca) ppc64le) 
 Jul 26 16:32:25 power8 worker[158130]: [info] Downloading openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso.sha256, request #207848 sent to Cache Service 
 Jul 26 16:32:35 power8 worker[158130]: [info] Download of openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso.sha256 processed: 
 Jul 26 16:32:35 power8 worker[158130]: [info] [#207848] Cache size of "/var/lib/openqa/cache" is 599 GiB, with limit 600 GiB 
 Jul 26 16:32:35 power8 worker[158130]: [info] [#207848] Downloading "openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso.sha256" from "http://openqa1-opensuse/tests/2483081/asset/other/openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso.sha256" 
 Jul 26 16:32:35 power8 worker[158130]: [info] [#207848] Content of "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso.sha256" has not changed, updating last use 
 Jul 26 16:32:35 power8 worker[158130]: [info] Downloading opensuse-Tumbleweed-ppc64le-20220725-DVD@ppc64le-with-ltp.qcow2, request #207849 sent to Cache Service 
 Jul 26 16:32:46 power8 worker[158130]: [info] Download of opensuse-Tumbleweed-ppc64le-20220725-DVD@ppc64le-with-ltp.qcow2 processed: 
 Jul 26 16:32:46 power8 worker[158130]: [info] [#207849] Cache size of "/var/lib/openqa/cache" is 599 GiB, with limit 600 GiB 
 Jul 26 16:32:46 power8 worker[158130]: [info] [#207849] Downloading "opensuse-Tumbleweed-ppc64le-20220725-DVD@ppc64le-with-ltp.qcow2" from "http://openqa1-opensuse/tests/2483081/asset/hdd/opensuse-Tumbleweed-ppc64le-20220725-DVD@ppc64le-with-ltp.qcow2" 
 Jul 26 16:32:46 power8 worker[158130]: [info] [#207849] Content of "/var/lib/openqa/cache/openqa1-opensuse/opensuse-Tumbleweed-ppc64le-20220725-DVD@ppc64le-with-ltp.qcow2" has not changed, updating last use 
 Jul 26 16:32:46 power8 worker[158130]: [info] Downloading openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso, request #207850 sent to Cache Service 
 Jul 26 16:32:51 power8 worker[158130]: [info] Download of openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso processed: 
 Jul 26 16:32:51 power8 worker[158130]: [info] [#207850] Cache size of "/var/lib/openqa/cache" is 599 GiB, with limit 600 GiB 
 Jul 26 16:32:51 power8 worker[158130]: [info] [#207850] Downloading "openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso" from "http://openqa1-opensuse/tests/2483081/asset/iso/openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso" 
 Jul 26 16:32:51 power8 worker[158130]: [info] [#207850] Content of "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20220725-Media.iso" has not changed, updating last use 
 Jul 26 16:32:51 power8 worker[158130]: [info] Rsync from 'rsync://openqa1-opensuse/tests' to '/var/lib/openqa/cache/openqa1-opensuse', request #207851 sent to Cache Service 
 Jul 26 16:32:56 power8 worker[158130]: [info] Output of rsync: 
 Jul 26 16:32:56 power8 worker[158130]: [info] [#207851] Calling: rsync -avHP --timeout 1800 rsync://openqa1-opensuse/tests/ --delete /var/lib/openqa/cache/openqa1-opensuse/tests/ 
 Jul 26 16:32:56 power8 worker[158130]: Try 1: 
 Jul 26 16:32:56 power8 worker[158130]: receiving incremental file list 
 Jul 26 16:32:56 power8 worker[158130]: sent 1,985 bytes    received 1,494,517 bytes    997,668.00 bytes/sec 
 Jul 26 16:32:56 power8 worker[158130]: total size is 7,044,120,358    speedup is 4,707.06 
 Jul 26 16:32:56 power8 worker[158130]: [info] Finished to rsync tests 
 Jul 26 16:32:56 power8 worker[158130]: [info] Preparing cgroup to start isotovideo 
 Jul 26 16:32:56 power8 worker[158130]: [info] Using cgroup /sys/fs/cgroup/systemd/openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@5.service/2483081 
 Jul 26 16:32:56 power8 worker[158130]: [info] Starting isotovideo container 
 Jul 26 16:32:56 power8 worker[158130]: [info] isotovideo has been started (PID: 158393) 
 Jul 26 16:32:56 power8 worker[158393]: [info] 158393: WORKING 2483081 
 Jul 26 16:33:57 power8 worker[158130]: [info] Test schedule has changed, reloading test_order.json 
 Jul 26 16:34:07 power8 worker[158130]: [info] Test schedule has changed, reloading test_order.json 
 Jul 26 22:33:03 power8 worker[158130]: [info] +++ worker notes +++ 
 Jul 26 22:33:03 power8 worker[158130]: [info] End time: 2022-07-26 22:33:03 
 Jul 26 22:33:03 power8 worker[158130]: [info] Result: timeout 
 Jul 26 22:33:03 power8 worker[24903]: [info] Uploading boot_ltp-config-5.18.11-1-default 
 Jul 26 22:33:03 power8 worker[24903]: [info] Uploading boot_ltp-framebuffer.dat.gz 
 Jul 26 22:33:03 power8 worker[24903]: [info] Uploading boot_ltp-kernel-pkg.txt 
 Jul 26 22:33:03 power8 worker[24903]: [info] Uploading boot_ltp-ver_linux_before.txt 
 Jul 26 22:33:03 power8 worker[24903]: [info] Uploading ltp_known_issues.yaml 
 Jul 27 14:18:23 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Jul 27 14:18:33 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Jul 27 14:18:33 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Jul 27 14:18:34 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Jul 28 03:51:00 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Jul 28 03:51:10 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Jul 28 03:51:10 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Jul 28 03:51:11 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Jul 29 12:51:23 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Jul 29 12:51:33 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Jul 29 12:51:33 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Jul 29 12:51:34 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Jul 29 19:40:40 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Jul 29 19:40:50 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Jul 29 19:40:50 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Jul 29 19:40:51 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Jul 30 10:38:29 power8 systemd[1]: Reloading openQA Worker #5... 
 Jul 30 10:38:29 power8 worker[158130]: [info] Received signal HUP 
 Jul 30 10:38:29 power8 systemd[1]: Reloaded openQA Worker #5. 
 Jul 30 10:53:25 power8 systemd[1]: Reloading openQA Worker #5... 
 Jul 30 10:53:25 power8 worker[158130]: [info] Received signal HUP 
 Jul 30 10:53:25 power8 systemd[1]: Reloaded openQA Worker #5. 
 Jul 30 11:05:29 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Jul 30 11:05:40 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Jul 30 11:05:40 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Jul 30 11:05:40 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Jul 30 11:58:58 power8 systemd[1]: Reloading openQA Worker #5... 
 Jul 30 11:58:58 power8 worker[158130]: [info] Received signal HUP 
 Jul 30 11:58:58 power8 systemd[1]: Reloaded openQA Worker #5. 
 Jul 30 12:11:59 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Jul 30 12:12:09 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Jul 30 12:12:10 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Jul 30 12:12:11 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 01 08:05:41 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 01 08:05:51 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 01 08:05:52 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 01 08:05:52 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 01 08:21:20 power8 systemd[1]: Reloading openQA Worker #5... 
 Aug 01 08:21:20 power8 worker[158130]: [info] Received signal HUP 
 Aug 01 08:21:20 power8 systemd[1]: Reloaded openQA Worker #5. 
 Aug 01 13:16:41 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 01 13:16:51 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 01 13:16:51 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 01 13:16:53 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 01 13:37:18 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 01 13:37:28 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 01 13:37:29 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 01 13:37:30 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 01 15:04:22 power8 systemd[1]: Reloading openQA Worker #5... 
 Aug 01 15:04:22 power8 worker[158130]: [info] Received signal HUP 
 Aug 01 15:04:22 power8 systemd[1]: Reloaded openQA Worker #5. 
 Aug 02 01:15:45 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 02 01:15:55 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 02 01:15:56 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 02 01:15:56 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 02 12:45:53 power8 systemd[1]: Reloading openQA Worker #5... 
 Aug 02 12:45:53 power8 worker[158130]: [info] Received signal HUP 
 Aug 02 12:45:53 power8 systemd[1]: Reloaded openQA Worker #5. 
 Aug 03 11:11:36 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 03 11:11:46 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 03 11:11:47 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 03 11:11:48 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 03 13:25:30 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 03 13:25:40 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 03 13:25:40 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 03 13:25:41 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 03 20:04:34 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 03 20:04:44 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 03 20:04:44 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 03 20:04:45 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 04 04:10:24 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 04 04:10:34 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 04 04:10:35 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 04 04:10:35 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 04 04:25:39 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 04 04:25:49 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 04 04:25:49 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 04 04:25:50 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 04 14:18:00 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 04 14:18:10 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 04 14:18:11 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 04 14:18:11 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 04 15:39:54 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 04 15:40:04 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 04 15:40:05 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 04 15:40:05 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 06 12:05:40 power8 worker[158130]: [warn] Websocket connection to http://openqa1-opensuse/api/v1/ws/17 finished by remote side with code 1006, no reason - trying again in 10 seconds 
 Aug 06 12:05:50 power8 worker[158130]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 06 12:05:51 power8 worker[158130]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 06 12:05:51 power8 worker[158130]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 Aug 08 14:42:25 power8 systemd[1]: Stopping openQA Worker #5... 
 Aug 08 14:42:25 power8 worker[158130]: [info] Received signal TERM 
 Aug 08 14:43:55 power8 systemd[1]: openqa-worker-auto-restart@5.service: State 'stop-sigterm' timed out. Killing. 
 Aug 08 14:43:55 power8 systemd[1]: openqa-worker-auto-restart@5.service: Killing process 158130 (worker) with signal SIGKILL. 
 Aug 08 14:43:55 power8 systemd[1]: openqa-worker-auto-restart@5.service: Killing process 24903 (worker) with signal SIGKILL. 
 Aug 08 14:43:55 power8 systemd[1]: openqa-worker-auto-restart@5.service: Main process exited, code=killed, status=9/KILL 
 Aug 08 14:43:55 power8 systemd[1]: openqa-worker-auto-restart@5.service: Failed with result 'timeout'. 
 Aug 08 14:43:55 power8 systemd[1]: Stopped openQA Worker #5. 
 Aug 08 14:43:55 power8 systemd[1]: Starting openQA Worker #5... 
 Aug 08 14:43:55 power8 systemd[1]: Started openQA Worker #5. 
 Aug 08 14:43:56 power8 worker[63906]: [info] worker 5: 
 Aug 08 14:43:56 power8 worker[63906]:    - config file:             /etc/openqa/workers.ini 
 Aug 08 14:43:56 power8 worker[63906]:    - worker hostname:         power8 
 Aug 08 14:43:56 power8 worker[63906]:    - isotovideo version:      26 
 Aug 08 14:43:56 power8 worker[63906]:    - websocket API version: 1 
 Aug 08 14:43:56 power8 worker[63906]:    - web UI hosts:            http://openqa1-opensuse 
 Aug 08 14:43:56 power8 worker[63906]:    - class:                   qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload 
 Aug 08 14:43:56 power8 worker[63906]:    - no cleanup:              no 
 Aug 08 14:43:56 power8 worker[63906]:    - pool directory:          /var/lib/openqa/pool/5 
 Aug 08 14:43:56 power8 worker[63906]: [info] CACHE: caching is enabled, setting up /var/lib/openqa/cache/openqa1-opensuse 
 Aug 08 14:43:56 power8 worker[63906]: [info] Project dir for host http://openqa1-opensuse is /var/lib/openqa/share 
 Aug 08 14:43:56 power8 worker[63906]: [info] Registering with openQA http://openqa1-opensuse 
 Aug 08 14:44:00 power8 worker[63906]: [info] Establishing ws connection via ws://openqa1-opensuse/api/v1/ws/17 
 Aug 08 14:44:00 power8 worker[63906]: [info] Registered and connected via websockets with openQA host http://openqa1-opensuse and worker ID 17 
 ``` 

 The last stopping/killing was triggered by me restarting the service. So apparently the worker was stuck and did not react to `SIGTERM` anymore. 

 Note that after restarting the services the jobs are now shown properly as "obsoleted" on the web UI and the worker slots are used again. 

 ### Suggestions 
 * Find out why the workers got stuck; even if the actual problem is on the web UI side the worker service should react to `SIGTERM`. 
 * Find out whether something went wrong on the web UI side. 
     * That jobs were cancelled is probably correct behavior. 
     * That the stale job detection did nothing about the jobs makes sense considering they were cancelled. 
     * That the workers kept being associated with the jobs does not seem to be a problem considering the workers were actually stuck and restarting them actually freed up the relation. 
     * So maybe nothing wrong on the web UI side? 

Back