action #18684
closedJobs with worker class qemu_x86_64 are taken by machines without this class, causing incomplete jobs
0%
Description
observation¶
Jobs with worker class qemu_x86_64 (settings and var.json) are taken by machines without this class, causing incomplete jobs.
For the jobs taken by overdrive2, the worker class on settings and var.json is different (qemu_x86_64 in settings and qemu_aarch64_maintenance in var.json)
problem¶
H1 When a big number of jobs are created by cloning a job (e.g. 100), 5% of these jobs are taken by a worker without a matching worker class.
H2 workers.ini is not configured properly. REJECTED BY E2-1 and E2-2
H3 The different worker classes on settings and var.json for the jobs taken by overdrive2 happened because the workers support multi-webui.
E1-1 Execute the workers on overdrive2 and QA-Power8-5-kvm with verbose mode and clone a job 100 times.
R1-1 Not done yet
E2-1 Check that the worker classes are properly configured in workers.ini on overdrive2.
R2-1 overdrive2 uses: qemu_aarch64_maintenance
E2-2 Check that the worker classes are properly configured on workers.ini on QA-Power8-5-kvm.
R2-2 QA-Power8-5-kvm uses: qemu_ppc64le,qemu_ppc64le_no_tmpfs
E3-1 Check that the worker.ini configuration for overdrive2 is set for multi webui.
R3-1 Two webui configured: http://openqa.suse.de http://lord.arch.suse.de
E3-2 Check that the worker.ini configuration for QA-Power8-5-kvm is set for one webui.
R3-2 One webui configured: http://openqa.suse.de
Updated by coolo about 7 years ago
What exactly did you do? WORKER_CLASS is different in vars.json than in settings: https://openqa.suse.de/tests/889178/file/vars.json
Updated by coolo about 7 years ago
Somehow multiple webuis are related:
Apr 20 14:14:31 overdrive2 worker[19403]: [ERROR] 502 response: Proxy Error (remaining tries: 2)
Apr 20 14:32:03 overdrive2 worker[19403]: [INFO] registering worker with openQA http://lord.arch.suse.de...
Apr 20 14:32:03 overdrive2 worker[19403]: [INFO] got job 889178: 00889178-sle-12-SP3-Server-DVD-x86_64-Build0340-textmode_statistics_poo_18634@64bit
Apr 20 14:32:03 overdrive2 worker[19403]: [INFO] 2015: WORKING 889178
Apr 20 14:32:11 overdrive2 worker[19403]: child 2015 died with exit status 256
Updated by SLindoMansilla about 7 years ago
- Description updated (diff)
- Status changed from New to In Progress
Updated by SLindoMansilla about 7 years ago
- Related to action #18634: [sles][functional]textmode install_and_reboot fails to stop the reboot countdown added
Updated by SLindoMansilla about 7 years ago
- Assignee deleted (
SLindoMansilla)
Unasigned to keep only 2 assigned tickets
Updated by okurz about 7 years ago
- Has duplicate action #19376: [tools][sle][functional] test fails to load. ppc64le worker tries to load x86_64 image added
Updated by szarate almost 7 years ago
- Related to action #20002: [tools] openqa sometimes doesn't update job_dependencies table added
Updated by szarate almost 7 years ago
- Assignee set to szarate
This this pr should take care of this. Problem is related to how the jobs are being added to the database, moving everything into a transaction as coolo suggested, solves the problem (apparently).
Time to hunt for poo#20002 as i belive that the condition there is a bit different.
Updated by szarate almost 7 years ago
- Status changed from In Progress to Resolved
I believe this is solved now.