action #25124
closed[tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever
0%
Description
So with a git checkout from 20170907, we're still seeing workers get stuck in the Fedora openQA instances. The symptoms are:
- In the admin interface, the worker shows as 'Working on job XXXXXX', but with no step
- When opening the job page, it just shows as 'State: assigned', but there is no live view, there are no logs, nothing to indicate the worker has done anything about the job at all
The worker logs look like this, from the start of the final successful job run:
Sep 08 13:43:13 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Job 156421 scheduled for next cycle
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [INFO] got job 156421: 00156421-fedora-27-Server-dvd-iso-i386-BuildFedora-27-20170906.n.0-install_default@64bit
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [INFO] 5318: WORKING 156421
Sep 08 13:43:19 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending IMMEDIATELY worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:28 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:43:58 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:13 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:28 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:44:48 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:44:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:45:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:45:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:46:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:00 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:47:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:47:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:47:50 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:05 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:48:50 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:05 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:20 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:49:43 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:49:43 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:50:03 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:33 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:50:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:03 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:36 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:51:55 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:51:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:52:18 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:52:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:52:45 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:15 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:53:45 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:00 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:15 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:30 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:54:35 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:54:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:54:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:55:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:24 qa07.qa.fedoraproject.org worker[905]: [INFO] cleaning up 00156421-fedora-27-Server-dvd-iso-i386-BuildFedora-27-20170906.n.0-install_default@64bit
Sep 08 13:56:24 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Build finished, setting us free to pick up new jobs
Sep 08 13:56:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:56:48 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:56:48 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 137.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Commands.pm line 140.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Job 156582 scheduled for next cycle
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [INFO] got job 156582: 00156582-fedora-27-universal-i386-BuildFedora-27-20170906.n.0-install_software_raid@64bit
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $host in pattern match (m//) at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 468.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 131.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $OpenQA::Worker::Engines::isotovideo::current_host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Engines/isotovideo.pm line 148.
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: [WARN] job is missing files, releasing job
Sep 08 13:56:52 qa07.qa.fedoraproject.org worker[905]: Mojo::Reactor::Poll: I/O watcher failed: No worker id or webui host set! at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 184.
Sep 08 13:57:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:57:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:58:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 13:59:35 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 13:59:35 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 13:59:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:00:55 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:10 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:25 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:40 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:01:52 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 14:01:52 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 14:02:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:42 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:02:57 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:42 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:03:57 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:12 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:27 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:04:33 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
Sep 08 14:04:33 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Connection turned off from http://openqa-stg01.qa.fedoraproject.org - 1006 :
Sep 08 14:04:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:05:53 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:08 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:23 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:38 qa07.qa.fedoraproject.org worker[905]: [DEBUG] Sending worker status to http://openqa-stg01.qa.fedoraproject.org
Sep 08 14:06:50 qa07.qa.fedoraproject.org worker[905]: Use of uninitialized value $reason in concatenation (.) or string at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 362.
after that it just does that same thing over and over - 'Sending worker status', with the occasional 'uninitialized value' error - until I restart the process.
I see that there's one further tweak to the scheduler code this morning:
https://github.com/os-autoinst/openQA/pull/1450
I'll deploy that, and see if it helps.