action #41066

Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY aka START_DIRECTLY_AFTER_TEST).

Added by pvorel over 1 year ago. Updated 4 months ago.

Status:ResolvedStart date:22/08/2019
Priority:NormalDue date:
Assignee:mkittler% Done:

100%

Category:Feature requests
Target version:Done
Difficulty:
Duration:

Description

This is LTP specific, but there might be similar use case for other tools/frameworks/tests.
LTP test install_ltp preinstalls SLE + SDK (based on qcow from create_hdd_minimal_base+sdk) + LTP itself.
It publish image, which is then used by many LTP tests (> 100).

The same is needed for bare metal, but we obviously cannot use PUBLISH_HDD_1.
One possibility is to install SLE + SDK + LTP on bare metal worker and run all the tests on that worker.
This doesn't scale, but at least better than nothing. Maybe it's possible to achieve this via WORKER_CLASS.

UPDATE: for these, who aren't familiar with LTP tests in openQA, it's possible to see them in
https://openqa.suse.de/group_overview/116
https://openqa.suse.de/group_overview/155
Look for install_ltp and any ltp_* test.

LTP tests using ipmi are in devel group (still unstable)
https://openqa.suse.de/group_overview/158

Some info is also in poo#40805.


Subtasks

action #55835: Support running multiple jobs directly in sequence on the...Resolvedmkittler


Related issues

Duplicated by openQA Project - action #46583: [tools][dependency jobs][scheduling] Request to support r... Closed 24/01/2019

History

#1 Updated by coolo over 1 year ago

As I explained in IRC: you can't have that with START_AFTER_TEST.

You need a new dependency class: FOLLOW_TEST_DIRECTLY - and I lack good ideas how to implement it.

#2 Updated by pvorel over 1 year ago

  • Subject changed from Scheduling jobs for IPMI (bare metal) on the same worker (aka START_AFTER_TEST). to Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY).

Sorry, I meant FOLLOW_TEST_DIRECTLY (or DIRECTLY_FOLLOW_TEST).
I don't have an idea how to implement either :(.

#3 Updated by okurz over 1 year ago

For s390x z/VM upgrade testing we (mainly mgriessmeier and me) also investigated this approach because a z/VM from the point of view of openQA can also be considered a "bare metal" machine as in: No VM images that can be published and reused in downstream jobs. Since then os-autoinst gained the feature to switch test variables, especially the VERSION, on the fly on request which allows more easily to instrument different OS versions within one scenario. This is also what is mainly used now with an autoyast installation of the old OS version and then different upgrade scenarios to the new OS version, with both the installation of the old OS as well as the upgrade within one scenario together each. An alternative we used in before is specific worker class settings to ensure one worker instance is reserved for two subsequent jobs which rely on each other, see https://gitlab.suse.de/openqa/salt-pillars-openqa/blob/master/openqa/workerconf.sls#L108 , although this is also not fool-proof because still something can happen outside the control of openQA which interfers with the machine.

IMHO this feature request is infeasible to implement, the test-scenario specific WORKER_CLASS is the best approach I know.

#4 Updated by coolo over 1 year ago

So you prefer to leave one server idling? Might be fine for s390 vms, but for bare metal servers this is pretty wasteful

#5 Updated by okurz over 1 year ago

coolo wrote:

So you prefer to leave one server idling? Might be fine for s390 vms, but for bare metal servers this is pretty wasteful

Sure it is. I did not comment what I prefer but if you are interested then I can state: I prefer to focus on VMs only for openQA as everything else so far does not feel like a first-class citizen and I recommend to avoid bare metal testing as long as we have not improved the stability of existing IPMI tests. We have enough tickets for that :)

#9 Updated by coolo about 1 year ago

  • Duplicated by action #46583: [tools][dependency jobs][scheduling] Request to support reasonable START_AFTER on ipmi. added

#10 Updated by pvorel about 1 year ago

  • Description updated (diff)

#11 Updated by coolo about 1 year ago

  • Priority changed from Normal to High
  • Target version set to Ready

This is highly wanted, but blocked by the architecture changes

#12 Updated by coolo about 1 year ago

  • Blocked by action #47117: EPIC: Fix worker->websocket->scheduler->webui connection added

#13 Updated by mkittler about 1 year ago

Is it conceivable to use btrfs snapshots for this?

#14 Updated by okurz about 1 year ago

How could this help? The problem that can happen when we simply use START_AFTER_TEST is that it is not guaranteed that the two related jobs run directly after each other, just "some time later".

Imagine 10 testsuites "ltp1" through "ltp10" that all specify "START_AFTER_TEST=installation". What is expected is that the installation is conducted on "a machine" and all 10 testsuites ltp1…10 run each on the same machine relying on the system being installed by the testsuite "installation".

But what can happen are for example the following cases:

  • on worker X we run installation but ltp7 is triggered as the next on worker Y with the same WORKER_CLASS where no system is installed -> fail
  • worker X conducts installation, then another job for "do_other_stuff" is triggered, fails, messes up the installation, then ltp3 is triggered when no usable system can be booted -> fail

Even more what has not been mentioned is:

  • installation is conducted, then ltp1 runs fine, ltp2 runs fine, ltp3 corrupts the OS, ltp4…10 fail as consequence (in comparison to VMs where we spawn a clean VM with backing file for each scenario and/or can use snapshots to revert to). In this specific case btrfs snapshots might help a bit, depending on the specific "corruption"

#15 Updated by okurz 9 months ago

@coolo On top of my last comment I am arguing that this is something that simply should not be done by openQA because openQA can not know how the "bare metal machine" behaves between each test run even if something like "FOLLOW_TEST_DIRECTLY" exists. IMHO it is up to the tests as in "os-autoinst-distri-opensuse" to ensure that all requirements for a test are ensured, that could be actually conducting an installation or just booting an already installed system. We should not be scared of "fast machine deployment" using whatever is available and most suitable, e.g. autoyast. It should be obvious that relying on an instrumented GUI installation test job is not the most efficient way to just end up in a root terminal :)

One other sane approach I could see is to extend the bare metal backend support to do something equivalent to "load a qcow image", as in some kind of hooks which can be tapped into for deployment, e.g. disk images that are dumped on the raw devices from a live system and then kexec'd/booted into.

#16 Updated by coolo 9 months ago

And I disagree

#17 Updated by pvorel 8 months ago

@coolo @okurz: ok, what is a current status, please? Anybody planning to implement, or should we (kernel-qa) decide, whether to implement it ourselves or use workaround (install SLES before each test and waste time spent on IPMI machines :( )?

Workaround could be unique WORKER_CLASS for each IPMI worker, see example (on outdated config):
https://gitlab.suse.de/pvorel/salt-pillars-openqa/commit/de23935efb88faf8ccdbd4b3a2b367e02b30c268

#18 Updated by okurz 8 months ago

I stated in #41066#note-15 that I see this is a questionable idea and unfeasible while coolo disagrees. My personal take on an ETA on this feature is optimistic: 2 months - pessimistic: never

pvorel wrote:

[…] or use workaround (install SLES before each test and waste time spent on IPMI machines :( )?

I think this is a good and stable approach however my estimation on the "time wasted" is in the range of ~10 minutes. I know that mmoese and also asmorodskyi are already currently looking into approaches to enhance the machine provisioning, are you in contact with them about their work?

#19 Updated by pvorel 8 months ago

okurz wrote:

I stated in #41066#note-15 that I see this is a questionable idea and unfeasible while coolo disagrees. My personal take on an ETA on this feature is optimistic: 2 months - pessimistic: never

Quite realistic (it'd take time to implement it). My question is whether somebody from tools team is planning to work on it or whether it's left to others.

pvorel wrote:

[…] or use workaround (install SLES before each test and waste time spent on IPMI machines :( )?


I think this is a good and stable approach however my estimation on the "time wasted" is in the range of ~10 minutes.

Really? I'd expect it to be longer. Installing LTP on bare metal takes about an hour (https://openqa.suse.de/tests/2992559), so maybe it's short. But I'll add another test which uses mmoese's approach (see below). Let's see if it's faster. But main slowness is due slow IPMI. I guess something like IPMI serial console will be needed (similar to virtio console or svirt serial console which brought big speedup).

I know that mmoese and also asmorodskyi are already currently looking into approaches to enhance the machine provisioning, are you in contact with them about their work?

Yes. Some ibtests uses iPXE installation with autoyast (https://openqa.suse.de/tests/3001543, https://openqa.suse.de/tests/3001544). That's mmoese's work.

QAM (osukup) also have some IPMI tests in their custom instance http://openqa.qam.suse.cz/. They also install over iPXE, workaround this issue (FOLLOW_TEST_DIRECTLY not being implemented) with special priorities.

None of this two solutions is perfect as any of them does not address FOLLOW_TEST_DIRECTLY feature.

#20 Updated by okurz 8 months ago

  • Category changed from 122 to Feature requests

#21 Updated by mkittler 7 months ago

installation is conducted, then ltp1 runs fine, ltp2 runs fine, ltp3 corrupts the OS, ltp4…10 fail as consequence (in comparison to VMs where we spawn a clean VM with backing file for each scenario and/or can use snapshots to revert to). In this specific case btrfs snapshots might help a bit, depending on the specific "corruption"

Yes, this problem can never be prevented regardless how much we improve the scheduling. And btrfs might only help a bit, indeed. But maybe it is a problem we can live with?

About the scheduling: To implement FOLLOW_TEST_DIRECTLY the scheduler needs more control over the worker than it currently has. In particular it needs to be able to reserve a worker exclusively not only for the execution of a single job but a whole batch of jobs. This should not be that hard to implement on the worker side:

  • At least on the worker-side it merely means that the worker must be capable of accepting a list of jobs at a time (instead of only a single job). The worker would then simply run those jobs in the specified order until all jobs are done or one job fails. It would not accept any other jobs until that happens (similar to its current behavior where it does not accept another job while already busy with one).
  • When the worker accepts the list of jobs it would obviously accept all of these jobs at the same time. Storing not a single job but a list of jobs is not a big deal with the new worker structure.
  • The accepted jobs which are not immediately processed (basically all jobs but the first) should be set to a different state, e.g. 'enqueued'. Setting the job state directly to 'running' would be confusing because one might think the jobs are executed concurrently.
  • This seems the most intuitive and easiest way to implement this. I see no interference with the shared worker feature.

I'm not that familiar with the web UI and scheduler side of this so I can not tell any impediments on that side. The the required changes would be:

  • Take the FOLLOW_TEST_DIRECTLY relation into account. That means we can get job batches.
  • Assign all jobs in a job batch at the same time. So set_assigned_worker/set_scheduling_worker would need to work on a job batch rather than only a single job.
  • That also means that the job-worker relationship in the database must change. A worker can now have multiple jobs assigned at the same time. That kind of relationship is actually already present in the database schema. It is used for the job history of a worker which I implemented some time ago. We should be able to use it for the scheduling as well. It is still an intrusive change which likely requires to adapt a lot of test code, too.
  • The websocket server needs to send multiple jobs at a time. So send_job and ws_send_job would become plural.

#22 Updated by pvorel 7 months ago

@mkittler thanks a lot for considering to implement it :). Are LTP tests the only one needed that? If yes we should really think about workaround. We already have some plans how to merge install_ltp into each LTP test, so this shouldn't be needed.

#23 Updated by mkittler 7 months ago

I don't know who needs this.

Note that implementing the point "That also means that the job-worker relationship in the database must change." would also have more positive side-effects. In particular, it would help to improve the stale job detection. So far, when a worker re-registers the job it was supposed to do is marked as incomplete (unless the worker claims that it is actually still working on that job). This would work more reliably with that database change because then the full job history of that worker can be considered.

#24 Updated by okurz 7 months ago

Certainly QSK&N is not the only team needing that. Everyone caring about tests on bare-metal or self-provisioning backends would eventually want a feature like this. @mkittler I suggest to rewrite the ticket description into https://progress.opensuse.org/projects/openqav3/wiki#Feature-requests first

#25 Updated by mkittler 6 months ago

@okurz I've just created a new ticket: https://progress.opensuse.org/issues/55835

I added my implementation idea as "tasks".

#26 Updated by mkittler 6 months ago

  • Subject changed from Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY). to Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY aka START_DIRECTLY_AFTER_TEST).
  • Assignee set to mkittler

#27 Updated by mkittler 6 months ago

What should happen if one creates a dependency tree like this (with FOLLOW_TEST_DIRECTLY/START_DIRECTLY_AFTER_TEST)?

   --> B --> C
 /
A
 \ 
   --> D --> E

The particularity I'd like to discuss is handling nodes (like A) with multiple chained children (like B and D).

I would say the sub trees B --> C and D --> E might be executed in random order. So the actual execution order might be either A --> B --> C --> D --> E or A --> D --> E --> B --> C. If only one of those chains is actually valid one simply can define that tree in the first place.

If A fails B, C, D and E are not attempted to be executed. Let's assume the B --> C sub tree is executed first. If B fails C is not attempted to be executed. However, the D --> E chain is still attempted to be executed. That might not work because B might have left the system in a bad state but I guess one can at least try. Maybe a post fail hook in B could also try to clean up the mess (e.g. killing all applications attempted to be started) to make it more likely subsequent tests will pass.

So that is my proposal how I'd intuitively had done it. That way it would work like regularly chained dependencies (at least I think they work this way in openQA). The main difference is of course that the B --> C and D --> E sub trees will never be executed in parallel.

#28 Updated by okurz 6 months ago

how about simply not allowing any multiple occurences, i.e. only a linear schedule, and fail otherwise? one can still think about combining the usual START_AFTER with the new field, or not?

#29 Updated by mkittler 6 months ago

how about simply not allowing any multiple occurences

Note that this is not about cutting corners to make my task for implementing this easier. I really ask for how this is expected to behave. Only allowing a linear schedule sounds very restrictive. One could even achieve the same by simply adding all test modules inside a single job (admittedly this will not scale very well). Besides, I don't like to take extra steps for disallowing this configuration just to change things again if it is required after all.

one can still think about combining the usual START_AFTER with the new field, or not?

Why would that be possible? That would change the meaning. E.g. if the A --> B and A --> D edges in my example were changed to be regularly chained A, the B --> C sub chain and the D --> E sub chain might be executed on different workers (and B --> C and D --> E possibly in parallel).

At least that is how I would have implemented directly chained dependencies. They only form a big "cluster" when not interrupted by a regularly chained dependency. Otherwise there can be sub clusters.

#30 Updated by okurz 6 months ago

mkittler wrote:

how about simply not allowing any multiple occurences


Note that this is not about cutting corners to make my task for implementing this easier. I really ask for how this is expected to behave. Only allowing a linear schedule sounds very restrictive. One could even achieve the same by simply adding all test modules inside a single job (admittedly this will not scale very well). Besides, I don't like to take extra steps for disallowing this configuration just to change things again if it is required after all.

Well, I thought aborting early with an explicit error message is both helpful for users as well as for you implementing it in a fast manner regardless if we find the need/time to change it later. I did not think you wanted to cut corners but it's actually a suggestion from me how to get the task done most easily for now.

#31 Updated by coolo 6 months ago

I don't see any alternative to what Olli said - if you run B -> C first, then D is no longer directly after A.

And it will surely be fun for the kernel team to calculcate an order of their tests to run in.

#32 Updated by pvorel 6 months ago

Our user case for LTP is many B-Z (96 jobs actually) depend on A (single job called install_ltp). Failure of any of B-Z should not affect the others, failure of A should cancel all B-Z.

#33 Updated by coolo 6 months ago

That's not FOLLOW_DIRECTLY, sorry. How exactly do you imagine one LTP test case leaving the machine in kernel panic useful for the followup?

#34 Updated by mkittler 6 months ago

I don't see any alternative to what Olli said - if you run B -> C first, then D is no longer directly after A.

That is right. But D will still be executed on the same worker as A. So that half of what makes the new dependency special would still be provided.

Our user case for LTP is many B-Z (96 jobs actually) depend on A (single job called install_ltp). Failure of any of B-Z should not affect the others, failure of A should cancel all B-Z.

I have expected this and hence came up with my suggestion. (The suggested behavior would allow exactly this.) But maybe "FOLLOW_TEST_DIRECTLY" is misleading then. It is more like "no test from a different direct-cluster can sneak in" and "guaranteed to run on the same worker".

How exactly do you imagine one LTP test case leaving the machine in kernel panic useful for the followup?

Likely the average failure is less severe and one can just continue by killing everything which is possibly still running in the post fail hook.

#35 Updated by pvorel 6 months ago

coolo wrote:

That's not FOLLOW_DIRECTLY, sorry. How exactly do you imagine one LTP test case leaving the machine in kernel panic useful for the followup?

That was in the original description of this feature request :) ("Our user case for LTP is many B-Z (96 jobs actually) depend on A (single job called install_ltp). Failure of any of B-Z should not affect the others, failure of A should cancel all B-Z.")

Regard to kernel panic: there should be reboot between tests.

#36 Updated by coolo 6 months ago

OK, so let's do as Marius suggested. With one change: don't make this a random order, but order by test suite name. So always A -> B -> C -> D -> E - avoids suprises.

#37 Updated by mkittler 6 months ago

  • Target version changed from Ready to Current Sprint

#38 Updated by mkittler 6 months ago

This seems to work now basically using the approach mentioned earlier: https://github.com/os-autoinst/openQA/pull/2309

"Basically" means that further testing is required. So far I only tested it locally with 2 jobs. However, more complicated dependency trees are covered by the unit tests.

I only deviated slightly from the original idea:

  1. The worker-to-job relation actually stays. Although a worker can now have multiple jobs assigned at the same time it is still useful to track the currently running job of a worker. The way these relations are used and updated still has to change in some places of course.
  2. "The worker would then simply run those jobs in the specified order until all jobs are done or one job fails." - We changed that to a more sophisticated model to cover "Our user case for LTP is many B-Z (96 jobs actually) depend on A (single job called install_ltp). Failure of any of B-Z should not affect the others, failure of A should cancel all B-Z.".
  3. It was not technically necessary to introduce an extra "enqueued" job state because having them "assigned" seems to be enough. Maybe I can still improve how it is displayed, though. (See screenshots in PR.)

Note that when assigning multiple directly chained children to a job (for handling failures like in 2.) the dependency graph will show the directly chained dependencies in form of the usual tree (direct and regularly chained dependencies are distinguishable via tool tips):

   --> B --> C
 /
A
 \ 
   --> D --> E

Better would be to show them in the actual execution sequence and to highlight the groups (and sub groups) of jobs which need to be skipped altogether if one of them fails:


A -> [ B -> C ] -> [ D -> E ]

However, implementing that in the graph would be quite some effort which I would postpone until we know the feature is actually as useful as we think.

#39 Updated by mkittler 6 months ago

  • Status changed from New to In Progress

#40 Updated by mkittler 5 months ago

This is how it looks like now:

screenshot_20190916_124113

  1. The execution order of multiple directly chained children within the same parent is determined by the TEST setting of the jobs. So the order here is always "...-parent", "...-01-child", "...-02-child", "...-02-child-01-..." and "...-03-child".
  2. You can also see the limitation of the graph I was talking about in the last comment.
  3. Cancelling "directly-chained-01-child" would so far cause its siblings "directly-chained-02-child" and "directly-chained-03-child" to be cancelled as well. (And their children so basically the entire cluster would be cancelled.)

I'm wondering whether 3. is the expected behavior. This is just what one gets by treating directly chained dependencies like regularly chained dependencies in that regard but I could try to change that. However, considering that your use case involves 96 jobs you likely even want to be able to cancel them altogether, right? So I'll keep it unless someone tells me that a different behavior is required.

#41 Updated by mkittler 5 months ago

  • Status changed from In Progress to Feedback

The PR has been merged and deployed on OSD.

Documentation has been added as well. (See https://github.com/os-autoinst/openQA/blob/master/docs/WritingTests.asciidoc#job-dependencies until http://open.qa/docs is updated.)

#42 Updated by mkittler 5 months ago

  • Target version changed from Current Sprint to Done

I can not set the ticket to resolved. I'm removing it at least from the current spring for now.

#43 Updated by okurz 5 months ago

It can't be set to "Resolved" because it's "blocked by" #47117 so either you change that relationship, e.g. remove and readd as just "related", or you make sure the blocking ticket is resolved first.

#44 Updated by okurz 5 months ago

As currently no test suite on osd is using the new dependency and I understood from QA SLE Kernel & Network that it does not fulfill their requirements I wonder what we should do in the next step.

#45 Updated by coolo 5 months ago

I understood from QA SLE Kernel & Network that it does not fulfill their requirements

Where is that from?

#46 Updated by okurz 5 months ago

coolo wrote:

I understood from QA SLE Kernel & Network that it does not fulfill their requirements

That's what I understood from #53948#note-12 that's why they seem to be eager to get https://github.com/os-autoinst/os-autoinst/pull/1208 and then https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329

#47 Updated by mkittler 5 months ago

  • Blocked by deleted (action #47117: EPIC: Fix worker->websocket->scheduler->webui connection)

#48 Updated by mkittler 5 months ago

I deleted the 'blocked by' relation. This is not blocked by the entire epic.

#49 Updated by pvorel 5 months ago

okurz wrote:

coolo wrote:

I understood from QA SLE Kernel & Network that it does not fulfill their requirements


That's what I understood from #53948#note-12 that's why they seem to be eager to get https://github.com/os-autoinst/os-autoinst/pull/1208 and then https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329

I started to use this feature yesterday for LTP bare metal. If it fulfill our needs we stay with it for bare metal. If not, we move to mdoucha changes (installing LTP for each job, combined with detecting whether LTP is installed and skip it would be probably the most safest solution)

mdoucha changes will be probably used for o3, where tests are failing due broken snapshot. And he started to developing it for some of his needs as well.

But IMHO it's useful feature and hope this solution will be used by others as Oliver claimed (#41066#note-24).

#50 Updated by mkittler 4 months ago

  • Status changed from Feedback to Resolved

#51 Updated by pvorel 4 months ago

@mkittler: Thanks for implementing this. It's working nice, simplified test setup and allowed to use machines more effectively :).

Also available in: Atom PDF