Project

General

Profile

action #10456

cloning of parents / children seems broken

Added by dimstar almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2016-01-27
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

In order to test if the missing image_gnome.qcow was 'just' a timing issue of 'something' cleaning up, I cloned retriggered the job:
116539 (install-gnome-image)

in the openQA UI, it was correctly shown that three children have been cloned with it.

But only one of the children has the parent defined in its settings - and can thus access the disk image. the other jobs do not have a parent defined...

History

#1 Updated by dimstar almost 7 years ago

Maybe for some more clarity:

The original job layout was:

116539
|-116569
|-116570
\-116571 (terminated incomplete, setup error, cloned to 116768)
         (timimig wise: 116569 & 116570 started ~ 20 minutes after 116539 was completed,
                                 116571 only > 30 minutes later, then failed to perform the setup,
                                 likely for missing assets. Something cleaning out assets too fast?)

Note: The CLONE 11676 already does not have a parent specified anymore (so the auto-cloning of a incomplete job breaks parenting)

Later, 116539 was cloned, new job layout (attempted manual rerun of the test-group)
116831
116832
116833
\-116857

The first two no longer are marked as children, and 'run away' without the asset created by their parent.

#2 Updated by oholecek almost 7 years ago

  • Assignee set to oholecek

#3 Updated by mlin7442 almost 7 years ago

the clue from logs, HTH,

[Wed Jan 27 20:38:39 2016] [error] org.freedesktop.DBus.Error.Failed: DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: duplicate key value violates unique constraint "job_dependencies_pkey"
DETAIL: Key (child_job_id, parent_job_id, dependency)=(116830, 116833, 1) already exists. [for Statement "INSERT INTO job_dependencies ( child_job_id, dependency, parent_job_id) VALUES ( ?, ?, ? )" with ParamValues: 1='116830', 2='1', 3='116833'] at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 545

#4 Updated by oholecek almost 7 years ago

I created test for this scenario and it passes. So the problem must be somewhere else. Looking at the log snipped it complains that the dependency already exists, but checking openqa, there is no such dependency.

#5 Updated by dimstar almost 7 years ago

Maybe the special thing is that it has three children?

I just cloned install-gnome-image again today (latest TW snapshot)

117874 => 118145 (incomplete? No children!) => 118153

Children misbehaved the same way as the last time: some had no parent and started directly, one waited, and then failed

#6 Updated by oholecek almost 7 years ago

Yes, I included it in my test, at least I hope. (See https://github.com/aaannz/openQA/commit/53b0783fd4a3b9320298bd92c2533271b81e2c34 ).

#7 Updated by oholecek almost 7 years ago

Looking at the latest what you posted it's behaving as expected (apart of failing children of the last clone). When you restart parent whose children are in scheduled state, children are not cloned but rerouted to new parent. That was design decision to avoid needles job creating.

See comment https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Schema/Result/Jobs.pm#L352

#8 Updated by dimstar almost 7 years ago

I triggered only one rescheduling of the job: at that time, not single TW job of that image was still running (the snapshot has been released last night)

Everything else was openQA doing

#9 Updated by oholecek almost 7 years ago

Maybe we didn't understand each other. I'm referring to

117874 => 118145 (incomplete? No children!) => 118153

117874 has 3 children,
118145 was cloned (for whatever reason) before finishing, thus all children had to be in scheduled state, thus it is by design it appears dependencies are broken, no child is displayed in UI
118153 has again 3 children (problem is they failed probably on setup failure, I suspect HDD image is missing or so)

EDIT: Ah, I see what might be misunderstood. By you in the second sentence I was meaning hypothetically. Like 'When one restart parent'. :)

#10 Updated by dimstar almost 7 years ago

Ah! Sorry, my bad.

the three children on https://openqa.opensuse.org/tests/118153 are actually not the right ones:
if you check them, they all refer to the same test (opensuse-Tumbleweed-DVD-x86_64-Build20160130-sysauth_gnome ) whereas the original parent had three different children.

#11 Updated by oholecek over 6 years ago

  • Status changed from New to In Progress

#12 Updated by oholecek over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF