action #25124
closed[tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever
0%
Description
So with a git checkout from 20170907, we're still seeing workers get stuck in the Fedora openQA instances. The symptoms are:
-
In the admin interface, the worker shows as 'Working on job XXXXXX', but with no step
-
When opening the job page, it just shows as 'State: assigned', but there is no live view, there are no logs, nothing to indicate the worker has done anything about the job at all
-
The worker logs look like this, from the start of the final successful job run:
Sep 08 13:43:13 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Job 156421 scheduled for next cycle
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [INFO] got job 156421: 00156421-fedora-27-Server-dvd-iso-i386-BuildFedora-27-20170906.n.0-install_default@64bit
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [INFO] 5318: WORKING 156421
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending IMMEDIATELY worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:28 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:58 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:13 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:28 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:48 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:44:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:45:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:00 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:47:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:47:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:50 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:05 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:50 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:05 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:43 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:49:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:50:03 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:33 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:03 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:36 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:55 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:51:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:52:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:52:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:52:45 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:15 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:45 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:15 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:35 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:54:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:54:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:24 qa07.qa.fedoraproject.org worker[905]: [INFO] cleaning up 00156421-fedora-27-Server-dvd-iso-i386-BuildFedora-27-20170906.n.0-install_default@64bit
Sep 08 13:56:24 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Build finished, setting us free to pick up new jobs
Sep 08 13:56:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:48 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:56:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 137.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 140.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Job 156582 scheduled for next cycle
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [INFO] got job 156582: 00156582-fedora-27-universal-i386-BuildFedora-27-20170906.n.0-install_software_raid@64bit
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in pattern match (m//) at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 468.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 131.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 148.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [WARN] job is missing files, releasing job
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Mojo::Reactor::Poll: I/O watcher failed: No worker id or webui host set! at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 184.
Sep 08 13:57:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:35 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:59:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:59:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 14:01:52 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 14:02:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:42 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:57 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:42 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:57 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:33 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 14:04:33 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 14:04:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:50 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
after that it just does that same thing over and over - 'Sending worker status', with the occasional 'uninitialized value' error - until I restart the process.
I see that there's one further tweak to the scheduler code this morning:
https://github.com/os-autoinst/openQA/pull/1450
I'll deploy that, and see if it helps.