Project

General

Profile

action #69976

coordination #103962: [saga][epic] Easy multi-machine handling: MM-tests as first-class citizens

coordination #15850: [epic] Improve displaying job dependencies

Show dependency graph for cloned jobs

Added by mkittler almost 2 years ago. Updated 4 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2020-08-13
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

User story

  • As a tester or openQA developer I, I often follow links to openQA jobs (e.g. from a ticket) and for further investigation it would be useful to see the dependency tree - even though the job has already been cloned in the meantime.
  • As a new user of openQA, I would like to avoid getting the wrong impression that dependencies of a job could not be created or have been deleted although they are just not displayed.

Acceptance criteria

  • AC1: The "Dependencies" tab is shown for jobs which have been cloned/restarted.
  • AC2: The tab shown as in AC1 contains the actual job one is looking at (and not a different job in the "cloning chain").

Notes

Here a few points for consideration. I wouldn't call these ACs because it requires some experimentation to find out what works best in practice and can be implemented with reasonable effort. I only added these points because I'm already aware of certain cases which can be tricky (and have been tricky in the past) and should be thought through when implementing this issue.

  1. The dependency graph has been disabled on purpose so far: Showing not only all children but also all of their clones lead to overwhelmingly big graphs which we should keep avoiding. So far we avoid the problem by showing only the latest jobs of the "cloning chain" within the graph but that means cloned jobs have no graph at all.
  2. It would likely be useful to show graphs of old jobs as well. Users frequently get confused about missing graphs and sometimes use older jobs as reference.
    1. Only dependencies of the same "cloning level" should be shown within a graph to avoid the problem described in 1..
    2. A dependency cluster might have only been cloned partially. Consider the example of restarting only a child but not the parent. Than there is not "one" cloning level. To avoid the problem described in 1. the graph of the parent job in this example should likely only show the most recent child like it does right now. The graph of the original child jobs should contain only the original child job. The graph of the cloned child jobs should contain only the cloned child job. 2.1. The last two statements are obvious but so far the graph of the original child job would not show the original child job at all but only contain the cloned child job.
    3. When skipping the restart of a passed/softaild child, the child is so far nevertheless added as child of to the cloned parent and therefore ends up as part of the old and the new cluster. So not only a parent (as explained in 2.2) but also a child might be part of multiple cloning levels. The graph rendering should treat both cases sensibly. To be consistent with 2.2 we should only show the most recent parent in the graph of the skipped child.

Suggestions

  1. Try to understand the code in OpenQA::WebAPI::Controller::Test::dependencies and related functions. Reading the code of the JavaScript function renderDependencyGraph might be helpful as well. It should not be necessary to touch any other code because this is only about displaying dependencies.
  2. Change the condition !defined($job->clone_id) && $job->has_dependencies to $job->has_dependencies. That's all what's needed to implement AC0.
  3. Create some relevant job clusters to test AC1 and the scenarios mentioned under notes. One could either start testing manually (e.g. by using manual SQL statements to add the required dependencies) or by extending some of the dependencies tests.
  4. Play around with the code mentioned in 1. to implement AC1 considering the notes.

Related issues

Related to openQA Project - action #59969: Display job dependency tab not only for latest jobs Resolved2019-11-18

Related to openQA Project - action #69979: Advanced job restarting via the web UIResolved2020-08-13

Related to openQA Tests - action #95788: [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network Feedback2021-07-21

Related to openQA Project - action #95783: Provide support for multi-machine scenarios handled by openqa-investigate size:MWorkable

History

#1 Updated by okurz almost 2 years ago

  • Status changed from Workable to New
  • Priority changed from Normal to Low
  • Target version set to future

I don't consider it workable yet, e.g. missing user story and specific suggestions. Try to put yourself into the shoes of a junior contributor that asks "ok, what should I look into first, e.g. what topic to research, what source code to read, what test to write, what experiment to run"

#2 Updated by mkittler almost 2 years ago

  • Description updated (diff)
  • Status changed from New to Workable

I added use-cases and suggestions. However, I would actually like to work on this ticket on myself - this graph stuff is quite fun with all the recursion going on.

#3 Updated by Julie_CAO almost 2 years ago

Hi Marius,

Happy to see progress for this ticket. Each improvement from openQA side about the directly chained tests will benefit us. We are even in discussion if we should postpone our test reconstruction until directly chain gets more mature.

Here are the test chains we created FYI: https://openqa.nue.suse.com/tests/overview?distri=sle&version=15-SP2&build=207.1&groupid=213

Some questions about this ticket:

What does 'clone' here mean? If I restart tests in a directly chain on web UI, are these restarted tests called cloned jobs here? or those jobs which are restarted via openqa-clone-job?

If we only rerun part of tests(a sub tree of the original test chain tree), will the 'cloning tree' graph be the sub tree only? Can we still view or operate the original tree after implementing this ticket?

The graph is not just a graph, it is the actual chain, we can rerun jobs with the graph, do I get it correctly?

#4 Updated by mkittler almost 2 years ago

What does 'clone' here mean? If I restart tests in a directly chain on web UI, are these restarted tests called cloned jobs here?

Yes. Jobs which have been restarted via the web UI or direct use of the REST-API's restart and duplicate routes are considered "cloned" and the newly created jobs "clones". That's true for the context of this issue and the web UI uses the same terms in several error messages and the "Cloned as …"/"Clone of …" info.

or those jobs which are restarted via openqa-clone-job?

No. Jobs "restarted" via that script are actually not considered restarted/cloned within the context of this issue and also not by the web UI, e.g. the web UI would not show the "Cloned as …"/"Clone of …" info. Note that the openqa-clone-job does not support parallel and directly chained dependencies so it is not really helpful here anyways.

If we only rerun part of tests(a sub tree of the original test chain tree), will the 'cloning tree' graph be the sub tree only?

I'm not 100 % sure what you mean but this question likely falls under the points I've mentioned under "notes" within the ticket description. That means it isn't completely sorted out at this point.

Can we still view or operate the original tree after implementing this ticket?

I don't know what you mean with operate. Being able to view the original tree is the main point of this issue.

The graph is not just a graph, it is the actual chain, we can rerun jobs with the graph, do I get it correctly?

The graph is just a graph. It shows dependencies between jobs. It does not show the cloning/restarting chain. Showing the graph for jobs which have already been cloned does not mean you can restart these jobs now.

#5 Updated by Julie_CAO almost 2 years ago

Hi Marius, I think I get the meaning. Taking a ABCDE test chain as an example, currently, given we restart a sub chain, such as ABC, then only ABC are shown in the new created graph, DE are lost, DE are only showed up in the original graph. After implementing this ticket, in the same case, DE will be shown in the new graph as well even if they have not been restarted. The new graph is identical with the original one, ABCDE. But on the new graph, only ABC which have been restarted are allowed to be restarted again, DE are not allowed. Do I follow you exactly?

If yes, it helps in some extend. The test reviewers always see the correct test dependencies regardless any restarting.

Of course, what we are expected most is DE can be restarted in the new graph(or in the original graph, or in any other ways) as well.

#6 Updated by okurz almost 2 years ago

  • Related to action #59969: Display job dependency tab not only for latest jobs added

#7 Updated by okurz almost 2 years ago

  • Related to action #69979: Advanced job restarting via the web UI added

#8 Updated by okurz almost 2 years ago

Please understand that this ticket is prioritized with low prio and we keep it in the "future" target version, i.e. the SUSE QA tools team does not plan to implement this anytime soon (not within the next months or years). Given that the ticket has already received good and comprehensive explanations and implmentation suggestions it can be feasible for any outside contributor to implement. We are always happy to receive pull requests from anyone :)

#9 Updated by cdywan 8 months ago

  • Related to action #95788: [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network added

#10 Updated by kraih 7 months ago

Stumbled over this while investigating #102428. Had quite a few parallel_failed jobs where the actual fail reason was hard to track down because of the missing dependency links. So consider this a +1 for the feature. :)

#11 Updated by okurz 6 months ago

  • Description updated (diff)

#12 Updated by okurz 6 months ago

  • Description updated (diff)

#13 Updated by okurz 6 months ago

  • Parent task set to #15850

#14 Updated by mkittler 5 months ago

Maybe it makes sense to do this before working on further tickets of the epic. I know, it is only a graphical improvement but seeing the dependency tree also of cloned jobs would be helpful e.g. to debug problems when making changes for #95783.

#15 Updated by okurz 4 months ago

  • Related to action #95783: Provide support for multi-machine scenarios handled by openqa-investigate size:M added

#16 Updated by okurz 4 months ago

  • Status changed from Workable to New
  • Target version changed from future to Ready

Yes, as discussed let's try it this way so adding this to the backlog. Back to "New" to (re-)estimate

#17 Updated by mkittler 4 months ago

  • Assignee set to mkittler

#18 Updated by mkittler 4 months ago

  • Status changed from New to In Progress

#19 Updated by mkittler 4 months ago

  • Status changed from In Progress to Feedback

#20 Updated by okurz 4 months ago

merged and deployed (on osd). I checked one job and the dependency graph in that example looks the same as in before so good. Also in clonee jobs I can see the dependency graph so all looks good now. Please crosscheck both ACs in production once again and then feel free to resolve.

#21 Updated by mkittler 4 months ago

  • Status changed from Feedback to Resolved

I only found jobs where both ACs are fullfilled as well, e.g. https://openqa.suse.de/tests/8196624#dependencies, https://openqa.suse.de/tests/8204268#dependencies ¹, https://openqa.suse.de/tests/8201179#dependencies and https://openqa.suse.de/tests/8204082#dependencies, https://openqa.suse.de/tests/8195204#dependencies, https://openqa.suse.de/tests/8196438#dependencies, https://openqa.suse.de/tests/8202910#dependencies, https://openqa.suse.de/tests/8196441#dependencies ¹, https://openqa.suse.de/tests/8198317#dependencies, https://openqa.suse.de/tests/8203164#dependencies.

The graphs of the latest jobs look good as well.


¹ I guess it is ok that the latest job is shown within the graph as well as long is it wouldn't go too far. (I have some conditions in the code to avoid going too far. I hope they're are good enough. In all graphs I've checked they were.) Additionally, if you go to the graph of the latest job then this graph doesn't include the old job (like it was before; so at least the graphs for the latest jobs should be as clean as possible).

#22 Updated by okurz 4 months ago

  • Status changed from Resolved to Feedback

please see #107311

#23 Updated by okurz 4 months ago

  • Related to action #107311: The dependency tree of `openqa-clone-job --clone-children` is broken added

#24 Updated by mkittler 4 months ago

  • Related to deleted (action #107311: The dependency tree of `openqa-clone-job --clone-children` is broken)

#25 Updated by mkittler 4 months ago

  • Status changed from Feedback to Resolved

Not related to #107311 after all.

Also available in: Atom PDF