action #25124
closed[tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever
0%
Description
So with a git checkout from 20170907, we're still seeing workers get stuck in the Fedora openQA instances. The symptoms are:
- In the admin interface, the worker shows as 'Working on job XXXXXX', but with no step
- When opening the job page, it just shows as 'State: assigned', but there is no live view, there are no logs, nothing to indicate the worker has done anything about the job at all
The worker logs look like this, from the start of the final successful job run:
Sep 08 13:43:13 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Job 156421 scheduled for next cycle
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [INFO] got job 156421: 00156421-fedora-27-Server-dvd-iso-i386-BuildFedora-27-20170906.n.0-install_default@64bit
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [INFO] 5318: WORKING 156421
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending IMMEDIATELY worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:28 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:58 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:13 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:28 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:48 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:44:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:45:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:00 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:47:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:47:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:50 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:05 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:50 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:05 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:43 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:49:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:50:03 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:33 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:03 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:36 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:55 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:51:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:52:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:52:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:52:45 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:15 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:45 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:15 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:35 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:54:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:54:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:24 qa07.qa.fedoraproject.org worker[905]: [INFO] cleaning up 00156421-fedora-27-Server-dvd-iso-i386-BuildFedora-27-20170906.n.0-install_default@64bit
Sep 08 13:56:24 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Build finished, setting us free to pick up new jobs
Sep 08 13:56:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:48 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:56:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 137.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 140.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Job 156582 scheduled for next cycle
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [INFO] got job 156582: 00156582-fedora-27-universal-i386-BuildFedora-27-20170906.n.0-install_software_raid@64bit
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in pattern match (m//) at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 468.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 131.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 148.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [WARN] job is missing files, releasing job
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Mojo::Reactor::Poll: I/O watcher failed: No worker id or webui host set! at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 184.
Sep 08 13:57:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:35 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:59:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:59:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 14:01:52 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 14:02:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:42 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:57 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:42 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:57 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:33 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 14:04:33 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 14:04:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:50 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
after that it just does that same thing over and over - 'Sending worker status', with the occasional 'uninitialized value' error - until I restart the process.
I see that there's one further tweak to the scheduler code this morning:
https://github.com/os-autoinst/openQA/pull/1450
I'll deploy that, and see if it helps.
Updated by AdamWill over 7 years ago
And I already got a stuck worker with the latest code, here are the worker logs, starting several jobs back at the point suspicious errors start appearing:
Sep 08 22:25:41 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Job 156843 scheduled for next cycle
Sep 08 22:25:41 qa07.qa.fedoraproject.org worker[28095]: [INFO] got job 156843: 00156843-fedora-27-universal-x86_64-BuildFedora-27-20170907.n.0-ANA272013-NOREPORT-install_delete_partial@uefi
Sep 08 22:25:43 qa07.qa.fedoraproject.org worker[28095]: [INFO] 7062: WORKING 156843
Sep 08 22:25:43 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending IMMEDIATELY worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:25:45 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:30:53 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Refusing, we are already performing another job
Sep 08 22:30:53 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:30:53 qa07.qa.fedoraproject.org worker[28095]: [ERROR] 400 response: Bad Request (remaining tries: 2)
Sep 08 22:30:58 qa07.qa.fedoraproject.org worker[28095]: [ERROR] 400 response: Bad Request (remaining tries: 1)
Sep 08 22:31:03 qa07.qa.fedoraproject.org worker[28095]: [ERROR] 400 response: Bad Request (remaining tries: 0)
Sep 08 22:31:03 qa07.qa.fedoraproject.org worker[28095]: [ERROR] Job aborted because web UI doesn't accept updates anymore (likely considers this job dead)
Sep 08 22:31:04 qa07.qa.fedoraproject.org worker[28095]: [ERROR] 400 response: Bad Request (remaining tries: 2)
Sep 08 22:31:08 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:31:09 qa07.qa.fedoraproject.org worker[28095]: [ERROR] 400 response: Bad Request (remaining tries: 1)
Sep 08 22:31:12 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Refusing, we are already performing another job
Sep 08 22:31:13 qa07.qa.fedoraproject.org worker[28095]: [INFO] registering worker with openQA http://openqa-stg01.qa.fedoraproject.org...
Sep 08 22:31:13 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1005 : Not specified
Sep 08 22:31:14 qa07.qa.fedoraproject.org worker[28095]: [ERROR] 400 response: Bad Request (remaining tries: 0)
Sep 08 22:31:14 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Either there is no job running or we were asked to stop: (1|Reason: api-failure)
Sep 08 22:31:14 qa07.qa.fedoraproject.org worker[28095]: killed 7062
Sep 08 22:31:55 qa07.qa.fedoraproject.org worker[28095]: [INFO] cleaning up 00156843-fedora-27-universal-x86_64-BuildFedora-27-20170907.n.0-ANA272013-NOREPORT-install_delete_partial@uefi
Sep 08 22:31:55 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Build finished, setting us free to pick up new jobs
Sep 08 22:31:55 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending IMMEDIATELY worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:31:55 qa07.qa.fedoraproject.org worker[28095]: [INFO] registering worker with openQA http://openqa-stg01.qa.fedoraproject.org...
Sep 08 22:31:55 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1008 : Connection terminated from WebSocket server - thought dead
Sep 08 22:32:09 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Job 156823 scheduled for next cycle
Sep 08 22:32:09 qa07.qa.fedoraproject.org worker[28095]: [INFO] got job 156823: 00156823-fedora-27-universal-x86_64-BuildFedora-27-20170907.n.0-ANA272013-NOREPORT-install_lvmthin@64bit
Sep 08 22:32:09 qa07.qa.fedoraproject.org worker[28095]: [INFO] 7519: WORKING 156823
Sep 08 22:32:09 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending IMMEDIATELY worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:32:15 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:32:30 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:32:45 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:33:00 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:33:15 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:33:30 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:33:45 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:34:00 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:34:15 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:34:33 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:34:48 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:35:03 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:35:18 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:35:33 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:35:48 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:36:03 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:36:18 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:36:33 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:36:48 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:36:55 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 : Not specified
Sep 08 22:37:15 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:37:30 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:37:45 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:38:00 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:38:15 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:38:34 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:38:53 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:39:08 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:39:23 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:39:38 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:39:41 qa07.qa.fedoraproject.org worker[28095]: [INFO] cleaning up 00156823-fedora-27-universal-x86_64-BuildFedora-27-20170907.n.0-ANA272013-NOREPORT-install_lvmthin@64bit
Sep 08 22:39:41 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Build finished, setting us free to pick up new jobs
Sep 08 22:39:41 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending IMMEDIATELY worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:39:41 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1008 : Connection terminated from WebSocket server - thought dead
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 137.
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 140.
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Job 156829 scheduled for next cycle
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: [INFO] got job 156829: 00156829-fedora-27-universal-x86_64-BuildFedora-27-20170907.n.0-ANA272013-NOREPORT-install_blivet_xfs@64bit
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $host in pattern match (m//) at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 468.
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 131.
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 148.
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: [WARN] job is missing files, releasing job
Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Mojo::Reactor::Poll: I/O watcher failed: No worker id or webui host set! at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 184.
Sep 08 22:40:01 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:40:16 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:40:31 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:40:46 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:41:01 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:41:16 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:41:31 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:41:46 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:42:01 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:42:16 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:42:31 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:42:46 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:43:01 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:43:16 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:43:31 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:43:46 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:44:01 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:44:16 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:44:31 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:44:44 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 : Not specified
Sep 08 22:45:04 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:45:19 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:45:34 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:45:49 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:46:04 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:46:19 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:46:34 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:46:49 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:47:04 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:47:19 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:47:34 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:47:50 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:48:05 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:48:20 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:48:35 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:48:50 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:49:05 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:49:20 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:49:35 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:49:37 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 : Not specified
Sep 08 22:49:57 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:50:12 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:50:27 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:50:42 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:50:57 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:51:12 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:51:27 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:51:42 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 22:51:57 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Updated by AdamWill over 7 years ago
Still seeing this with the 20170908 code and Mojolicious 7.45. Here's a current case:
Updated by EDiGiacinto over 7 years ago
- Subject changed from Workers still getting stuck with 20170907 scheduler: job shows as 'State: assigned' forever, worker logs constantly sending status to server but never does anything to Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever
- Category changed from 122 to Regressions/Crashes
AdamWill wrote:
Sep 08 22:39:38 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org Sep 08 22:39:41 qa07.qa.fedoraproject.org worker[28095]: [INFO] cleaning up 00156823-fedora-27-universal-x86_64-BuildFedora-27-20170907.n.0-ANA272013-NOREPORT-install_lvmthin@64bit Sep 08 22:39:41 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Build finished, setting us free to pick up new jobs Sep 08 22:39:41 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Sending IMMEDIATELY worker status to http://openqa-stg01.qa.fedoraproject.org Sep 08 22:39:41 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1008 : Connection terminated from WebSocket server - thought dead Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 137. Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 140. Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: [DEBUG] Job 156829 scheduled for next cycle Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: [INFO] got job 156829: 00156829-fedora-27-universal-x86_64-BuildFedora-27-20170907.n.0-ANA272013-NOREPORT-install_blivet_xfs@64bit Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $host in pattern match (m//) at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 468. Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 131. Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 148. Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: [WARN] job is missing files, releasing job Sep 08 22:39:44 qa07.qa.fedoraproject.org worker[28095]: Mojo::Reactor::Poll: I/O watcher failed: No worker id or webui host set! at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 184.
As i see it, the problem hides here, not in the websocket server: i see failure in updating the web interface about the job, due to a premature connection close that is cleaning the job current_host
Updated by EDiGiacinto over 7 years ago
Seems like so, we see the same errors in one of the workers that is currently stuck:
Sep 10 12:37:37 openqaworker2 worker[21911]: [DEBUG] Sending worker status to openqa.suse.de
Sep 10 12:37:19 openqaworker2 worker[21911]: Mojo::Reactor::Poll: I/O watcher failed: No worker id or webui host set! at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 184.
Sep 10 12:37:19 openqaworker2 worker[21911]: [WARN] job is missing files, releasing job
Sep 10 12:37:19 openqaworker2 worker[21911]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 160.
Sep 10 12:37:19 openqaworker2 worker[21911]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 148.
Sep 10 12:37:19 openqaworker2 worker[21911]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 131.
Sep 10 12:37:19 openqaworker2 worker[21911]: Use of uninitialized value $host in pattern match (m//) at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 468.
Sep 10 12:37:19 openqaworker2 worker[21911]: [INFO] got job 1157542: 01157542-sle-15-Leanos-DVD-Staging:G-x86_64-BuildG.21.2-default_install@64bit-staging
Sep 10 12:37:19 openqaworker2 worker[21911]: [DEBUG] Job 1157542 scheduled for next cycle
Sep 10 12:37:19 openqaworker2 worker[21911]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 140.
Sep 10 12:37:19 openqaworker2 worker[21911]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 137.
Sep 10 12:37:17 openqaworker2 worker[21911]: [DEBUG] Connection turned off from openqa.suse.de - 1006 :
Sep 10 12:37:17 openqaworker2 worker[21911]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 10 12:37:11 openqaworker2 worker[21911]: [DEBUG] Sending worker status to openqa.suse.de
Sep 10 12:36:56 openqaworker2 worker[21911]: [DEBUG] Sending worker status to openqa.suse.de
Sep 10 12:36:41 openqaworker2 worker[21911]: [DEBUG] Sending worker status to openqa.suse.de
Edit:
If i'm correct, as a confirmation of a bad behavior from workers, you should see in websocket server logs that the worker is sending a wrong message, declaring that the job is in "queue" for the worker instead of declaring itself free, with state in scheduled (you should see it as running) - with the $job still set due to not to a clear teardown. The beauty of shared global variables :)
Updated by EDiGiacinto over 7 years ago
https://github.com/os-autoinst/openQA/pull/1451 Merged, please let us know if the problem still persist
Updated by AdamWill over 7 years ago
Am trying with that plus the commit from your git repo that sets the websocket inactivity timeout to 0.
Updated by AdamWill over 7 years ago
- Status changed from New to Resolved
Well, since this fix, I'm not getting stuck workers any more. I am still getting jobs incompleted without being duplicated, but I'm filing other issues on those problems. So, let's close this one.
Updated by szarate over 7 years ago
- Subject changed from Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever to [tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever
Updated by EDiGiacinto almost 7 years ago
- Related to coordination #32851: [tools][EPIC] Scheduling redesign added
Updated by EDiGiacinto over 6 years ago
- Related to action #35296: Error messages on worker about "Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 359, <GEN298662> line 4." added