Project

General

Profile

Actions

action #20002

closed

[tools] openqa sometimes doesn't update job_dependencies table

Added by thehejik over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
-
Start date:
2017-06-22
Due date:
% Done:

0%

Estimated time:

Description

For multi-machine jobs (caasp and slenkins) openQA time to time doesn't schedule all child jobs by triggering its parent CaaSP-controller or slenkins--control job.
It seems that not all child jobs are running because **there are missing entries for that jobs in job_dependencies SQL table
*.

Example of broken job https://openqa.suse.de/tests/1016423 CaaSP-controller (In this case we miss admin node so then the whole test failed)

`# select count(child_job_id) from job_dependencies where parent_job_id=1016423;
count 
-------
22
(1 row)`

If you try examine some successful CaaSP-controller job (eg. id=1015418) you should get count=25 (1x controller, 1x admin, 1x master, 22x workers).

I'm not able to reproduce the issue on request but the problem sometimes occurs in my local openqa instance using sqlite and also o.s.d using postgresql. The broken job dependency could be solved by posting iso again.

Maybe it has something to do with scheduler which just skips some db insert queries.

I'm sorry being so brief but I really don't know more.


Related issues 2 (0 open2 closed)

Related to openQA Project (public) - action #18684: Jobs with worker class qemu_x86_64 are taken by machines without this class, causing incomplete jobsResolvedszarate2017-04-20

Actions
Related to openQA Tests (public) - action #20790: [qam] SLE12-SP3 test fails in 1__unknown_ - slenkins-tests-openvpn-controlRejectedpcervinka2017-07-26

Actions
Actions #1

Updated by coolo over 7 years ago

We also have the problem that sometimes jobs are scheduled on the wrong workerclass. This and your issue together make me believe that the jobs are grabed/scheduled before the final picture is there, i.e. job settings and job dependencies aren't inserted in a transaction but piece by piece? Can you check?

Actions #2

Updated by szarate over 7 years ago

  • Related to action #18684: Jobs with worker class qemu_x86_64 are taken by machines without this class, causing incomplete jobs added
Actions #3

Updated by thehejik over 7 years ago

coolo wrote:

We also have the problem that sometimes jobs are scheduled on the wrong workerclass. This and your issue together make me believe that the jobs are grabed/scheduled before the final picture is there, i.e. job settings and job dependencies aren't inserted in a transaction but piece by piece? Can you check?

Sorry, I have no idea how to check that.

Actions #4

Updated by coolo over 7 years ago

Somehow I had the feeling that I was talking to Ettore :)

Actions #5

Updated by EDiGiacinto over 7 years ago

@coolo: seems possible, AFAICS from https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Scheduler/Scheduler.pm#L214 the search->first is not into a transaction

Actions #6

Updated by szarate over 7 years ago

Looks like poo#18684 is fixed, but this is still happening, https://openqa.suse.de/tests/1061075#settings

Actions #7

Updated by EDiGiacinto over 7 years ago

it might be fixed by https://github.com/os-autoinst/openQA/pull/1389 - unfortunately can't reproduce the issue locally in one machine. But i've been running openQA with those patches with no issues

Actions #8

Updated by coolo over 7 years ago

Fixed in master doesn't matter here - we have c2c7bcd2 deployed. Remember EDiGiacinto's first feature? :)

Actions #9

Updated by asmorodskyi over 7 years ago

evidence that issue is not fixed https://openqa.suse.de/tests/1065721

Actions #10

Updated by pcervinka over 7 years ago

  • Related to action #20790: [qam] SLE12-SP3 test fails in 1__unknown_ - slenkins-tests-openvpn-control added
Actions #11

Updated by okurz over 7 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: slenkins-tests-openvpn-control
https://openqa.suse.de/tests/1130899

Actions #12

Updated by coolo about 7 years ago

  • Status changed from New to Resolved

the same test worked flawless in all of october and november. So I assuming it's fixed

Actions

Also available in: Atom PDF