action #37312
closedWorkers were deleted and recreated
0%
Description
I have noticed that we are randomly missing Assigned worker information in ppc jobs, although Logs & Assets are still accessible.
Expected results:
- sle-15-Installer-DVD-ppc64le-Build665.2-extra_tests_on_gnome@ppc64le
- sle-15-Installer-DVD-ppc64le-Build665.1-extra_tests_on_gnome@ppc64le
- sle-15-Installer-DVD-ppc64le-Build662.1-extra_tests_on_gnome@ppc64le
- sle-15-Installer-DVD-ppc64le-Build628.5-extra_tests_on_gnome@ppc64le
Missing data:
- sle-15-Installer-DVD-ppc64le-Build661.1-extra_tests_on_gnome@ppc64le
- sle-15-Installer-DVD-ppc64le-Build658.1-extra_tests_on_gnome@ppc64le
- sle-15-Installer-DVD-ppc64le-Build657.1-extra_tests_on_gnome@ppc64le
- sle-15-Installer-DVD-ppc64le-Build634.2-extra_tests_on_gnome@ppc64le
- sle-15-Installer-DVD-ppc64le-Build625.5-extra_tests_on_gnome@ppc64le
Updated by coolo over 6 years ago
something/someone deleted the worker from the database. The QA-Power8-4-kvm worker we have right now was created 2018-06-04 - aka after your jobs ran.
And deleting worker DB entries removes the reference in jobs.
But I wonder if someone really deleted this manually, e.g. worker id 1073 was created just yesterday, even though the systemd service
for it has logs for May 20.
Updated by coolo over 6 years ago
- Subject changed from [tools] Missing assigned worker in result bar to [tools] Workers were deleted and recreated
- Priority changed from Normal to Low
the workers weren't removed since then. Still no idea what triggered the deletion, but it's on the monitoring list
Updated by coolo about 6 years ago
- Subject changed from [tools] Workers were deleted and recreated to Workers were deleted and recreated
- Category set to 168
- Priority changed from Low to Normal
- Target version set to Ready
It's still happening - notice the vastly different t_created
openqa=> select * from workers where host='openqaworker5' order by instance;
id | host | instance | t_created | t_updated | job_id | upload_progress
------+---------------+----------+---------------------+---------------------+---------+-----------------
957 | openqaworker5 | 1 | 2018-03-20 13:11:06 | 2018-10-18 08:48:26 | 2187481 |
961 | openqaworker5 | 2 | 2018-03-20 13:15:53 | 2018-10-18 08:48:26 | 2187308 |
985 | openqaworker5 | 3 | 2018-03-20 13:26:13 | 2018-10-18 08:48:18 | 2187530 |
1202 | openqaworker5 | 4 | 2018-10-11 05:10:25 | 2018-10-18 08:48:24 | 2187185 |
984 | openqaworker5 | 5 | 2018-03-20 13:25:26 | 2018-10-18 08:48:25 | 2186488 |
660 | openqaworker5 | 6 | 2017-07-24 17:23:19 | 2018-10-18 08:48:23 | 2186498 |
355 | openqaworker5 | 7 | 2016-11-13 18:24:14 | 2018-10-18 08:48:20 | 2187209 |
646 | openqaworker5 | 8 | 2017-07-24 17:08:07 | 2018-10-18 08:48:22 | 2187229 |
357 | openqaworker5 | 9 | 2016-11-13 18:24:16 | 2018-10-18 08:48:24 | 2186671 |
960 | openqaworker5 | 10 | 2018-03-20 13:15:46 | 2018-10-18 08:48:23 | 2187430 |
918 | openqaworker5 | 11 | 2017-12-21 09:53:21 | 2018-10-18 08:48:21 | 2187184 |
847 | openqaworker5 | 12 | 2017-12-21 09:42:35 | 2018-10-18 08:48:26 | 2187554 |
958 | openqaworker5 | 13 | 2018-03-20 13:11:08 | 2018-10-18 08:48:19 | 2187520 |
895 | openqaworker5 | 14 | 2017-12-21 09:48:06 | 2018-10-18 08:48:21 | 2187453 |
1193 | openqaworker5 | 15 | 2018-10-11 05:10:08 | 2018-10-18 08:48:26 | 2187429 |
953 | openqaworker5 | 16 | 2018-03-20 13:07:18 | 2018-10-18 08:48:26 | 2187416 |
967 | openqaworker5 | 17 | 2018-03-20 13:16:06 | 2018-10-18 08:48:15 | 2187657 |
955 | openqaworker5 | 18 | 2018-03-20 13:08:10 | 2018-10-18 08:48:19 | 2187572 |
1203 | openqaworker5 | 19 | 2018-10-11 05:10:27 | 2018-10-18 08:48:19 | 2186499 |
669 | openqaworker5 | 20 | 2017-07-26 12:47:24 | 2018-10-18 08:48:21 | 2187315 |
Updated by mkittler almost 6 years ago
I've just been looking at the code and it seems we don't provide an API route for deleting workers. It is hard to tell for sure but it also doesn't look like that the openQA code ever deletes workers (at least not via DBIx delete).
Maybe some SQL cascade delete does that?
Updated by mkittler almost 6 years ago
- Status changed from New to In Progress
- Assignee set to mkittler
- Target version changed from Ready to Current Sprint
Could be. I'll try to reproduce that with the test fixtures.
Updated by mkittler almost 6 years ago
That was the problem, indeed: https://github.com/os-autoinst/openQA/pull/2021
Updated by coolo almost 6 years ago
Be careful - let's say it's one problem :)
But I'm confident
Updated by mkittler almost 6 years ago
Right now I can only think of one more relation. But I've just did a quick test with fixtures and there's no deletion happening. If you like we can add this test nevertheless: https://github.com/Martchus/openQA/pull/new/test_previous_job_deletion
Updated by mkittler almost 6 years ago
- Status changed from In Progress to Resolved