Project

General

Profile

action #41066

Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY aka START_DIRECTLY_AFTER_TEST).

Added by pvorel about 3 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2019-08-22
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

This is LTP specific, but there might be similar use case for other tools/frameworks/tests.
LTP test install_ltp preinstalls SLE + SDK (based on qcow from create_hdd_minimal_base+sdk) + LTP itself.
It publish image, which is then used by many LTP tests (> 100).

The same is needed for bare metal, but we obviously cannot use PUBLISH_HDD_1.
One possibility is to install SLE + SDK + LTP on bare metal worker and run all the tests on that worker.
This doesn't scale, but at least better than nothing. Maybe it's possible to achieve this via WORKER_CLASS.

UPDATE: for these, who aren't familiar with LTP tests in openQA, it's possible to see them in
https://openqa.suse.de/group_overview/116
https://openqa.suse.de/group_overview/155
Look for install_ltp and any ltp_* test.

LTP tests using ipmi are in devel group (still unstable)
https://openqa.suse.de/group_overview/158

Some info is also in poo#40805.


Subtasks

action #55835: Support running multiple jobs directly in sequence on the same machineResolvedmkittler


Related issues

Has duplicate openQA Project - action #46583: [tools][dependency jobs][scheduling] Request to support reasonable START_AFTER on ipmi.Closed2019-01-24

History

#1 Updated by coolo about 3 years ago

As I explained in IRC: you can't have that with START_AFTER_TEST.

You need a new dependency class: FOLLOW_TEST_DIRECTLY - and I lack good ideas how to implement it.

#2 Updated by pvorel about 3 years ago

  • Subject changed from Scheduling jobs for IPMI (bare metal) on the same worker (aka START_AFTER_TEST). to Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY).

Sorry, I meant FOLLOW_TEST_DIRECTLY (or DIRECTLY_FOLLOW_TEST).
I don't have an idea how to implement either :(.

#3 Updated by okurz about 3 years ago

For s390x z/VM upgrade testing we (mainly mgriessmeier and me) also investigated this approach because a z/VM from the point of view of openQA can also be considered a "bare metal" machine as in: No VM images that can be published and reused in downstream jobs. Since then os-autoinst gained the feature to switch test variables, especially the VERSION, on the fly on request which allows more easily to instrument different OS versions within one scenario. This is also what is mainly used now with an autoyast installation of the old OS version and then different upgrade scenarios to the new OS version, with both the installation of the old OS as well as the upgrade within one scenario together each. An alternative we used in before is specific worker class settings to ensure one worker instance is reserved for two subsequent jobs which rely on each other, see https://gitlab.suse.de/openqa/salt-pillars-openqa/blob/master/openqa/workerconf.sls#L108 , although this is also not fool-proof because still something can happen outside the control of openQA which interfers with the machine.

IMHO this feature request is infeasible to implement, the test-scenario specific WORKER_CLASS is the best approach I know.

#4 Updated by coolo about 3 years ago

So you prefer to leave one server idling? Might be fine for s390 vms, but for bare metal servers this is pretty wasteful

#5 Updated by okurz about 3 years ago

coolo wrote:

So you prefer to leave one server idling? Might be fine for s390 vms, but for bare metal servers this is pretty wasteful

Sure it is. I did not comment what I prefer but if you are interested then I can state: I prefer to focus on VMs only for openQA as everything else so far does not feel like a first-class citizen and I recommend to avoid bare metal testing as long as we have not improved the stability of existing IPMI tests. We have enough tickets for that :)

#9 Updated by coolo over 2 years ago

  • Has duplicate action #46583: [tools][dependency jobs][scheduling] Request to support reasonable START_AFTER on ipmi. added

#10 Updated by pvorel over 2 years ago

  • Description updated (diff)

#11 Updated by coolo over 2 years ago

  • Priority changed from Normal to High
  • Target version set to Ready

This is highly wanted, but blocked by the architecture changes

#12 Updated by coolo over 2 years ago

  • Blocked by coordination #47117: [epic] Fix worker->websocket->scheduler->webui connection added

#13 Updated by mkittler over 2 years ago

Is it conceivable to use btrfs snapshots for this?

#14 Updated by okurz over 2 years ago

How could this help? The problem that can happen when we simply use START_AFTER_TEST is that it is not guaranteed that the two related jobs run directly after each other, just "some time later".

Imagine 10 testsuites "ltp1" through "ltp10" that all specify "START_AFTER_TEST=installation". What is expected is that the installation is conducted on "a machine" and all 10 testsuites ltp1…10 run each on the same machine relying on the system being installed by the testsuite "installation".

But what can happen are for example the following cases:

  • on worker X we run installation but ltp7 is triggered as the next on worker Y with the same WORKER_CLASS where no system is installed -> fail
  • worker X conducts installation, then another job for "do_other_stuff" is triggered, fails, messes up the installation, then ltp3 is triggered when no usable system can be booted -> fail

Even more what has not been mentioned is:

  • installation is conducted, then ltp1 runs fine, ltp2 runs fine, ltp3 corrupts the OS, ltp4…10 fail as consequence (in comparison to VMs where we spawn a clean VM with backing file for each scenario and/or can use snapshots to revert to). In this specific case btrfs snapshots might help a bit, depending on the specific "corruption"

#15 Updated by okurz over 2 years ago

coolo On top of my last comment I am arguing that this is something that simply should not be done by openQA because openQA can not know how the "bare metal machine" behaves between each test run even if something like "FOLLOW_TEST_DIRECTLY" exists. IMHO it is up to the tests as in "os-autoinst-distri-opensuse" to ensure that all requirements for a test are ensured, that could be actually conducting an installation or just booting an already installed system. We should not be scared of "fast machine deployment" using whatever is available and most suitable, e.g. autoyast. It should be obvious that relying on an instrumented GUI installation test job is not the most efficient way to just end up in a root terminal :)

One other sane approach I could see is to extend the bare metal backend support to do something equivalent to "load a qcow image", as in some kind of hooks which can be tapped into for deployment, e.g. disk images that are dumped on the raw devices from a live system and then kexec'd/booted into.

#16 Updated by coolo over 2 years ago

And I disagree

#17 Updated by pvorel over 2 years ago

coolo okurz: ok, what is a current status, please? Anybody planning to implement, or should we (kernel-qa) decide, whether to implement it ourselves or use workaround (install SLES before each test and waste time spent on IPMI machines :( )?

Workaround could be unique WORKER_CLASS for each IPMI worker, see example (on outdated config):
https://gitlab.suse.de/pvorel/salt-pillars-openqa/commit/de23935efb88faf8ccdbd4b3a2b367e02b30c268

#18 Updated by okurz over 2 years ago

I stated in #41066#note-15 that I see this is a questionable idea and unfeasible while coolo disagrees. My personal take on an ETA on this feature is optimistic: 2 months - pessimistic: never

pvorel wrote:

[…] or use workaround (install SLES before each test and waste time spent on IPMI machines :( )?

I think this is a good and stable approach however my estimation on the "time wasted" is in the range of ~10 minutes. I know that mmoese and also asmorodskyi are already currently looking into approaches to enhance the machine provisioning, are you in contact with them about their work?

#19 Updated by pvorel over 2 years ago

okurz wrote:

I stated in #41066#note-15 that I see this is a questionable idea and unfeasible while coolo disagrees. My personal take on an ETA on this feature is optimistic: 2 months - pessimistic: never

Quite realistic (it'd take time to implement it). My question is whether somebody from tools team is planning to work on it or whether it's left to others.

pvorel wrote:

[…] or use workaround (install SLES before each test and waste time spent on IPMI machines :( )?

I think this is a good and stable approach however my estimation on the "time wasted" is in the range of ~10 minutes.

Really? I'd expect it to be longer. Installing LTP on bare metal takes about an hour (https://openqa.suse.de/tests/2992559), so maybe it's short. But I'll add another test which uses mmoese's approach (see below). Let's see if it's faster. But main slowness is due slow IPMI. I guess something like IPMI serial console will be needed (similar to virtio console or svirt serial console which brought big speedup).

I know that mmoese and also asmorodskyi are already currently looking into approaches to enhance the machine provisioning, are you in contact with them about their work?

Yes. Some ibtests uses iPXE installation with autoyast (https://openqa.suse.de/tests/3001543, https://openqa.suse.de/tests/3001544). That's mmoese's work.

QAM (osukup) also have some IPMI tests in their custom instance http://openqa.qam.suse.cz/. They also install over iPXE, workaround this issue (FOLLOW_TEST_DIRECTLY not being implemented) with special priorities.

None of this two solutions is perfect as any of them does not address FOLLOW_TEST_DIRECTLY feature.

#20 Updated by okurz over 2 years ago

  • Category changed from 122 to Feature requests

#21 Updated by mkittler about 2 years ago

installation is conducted, then ltp1 runs fine, ltp2 runs fine, ltp3 corrupts the OS, ltp4…10 fail as consequence (in comparison to VMs where we spawn a clean VM with backing file for each scenario and/or can use snapshots to revert to). In this specific case btrfs snapshots might help a bit, depending on the specific "corruption"

Yes, this problem can never be prevented regardless how much we improve the scheduling. And btrfs might only help a bit, indeed. But maybe it is a problem we can live with?

About the scheduling: To implement FOLLOW_TEST_DIRECTLY the scheduler needs more control over the worker than it currently has. In particular it needs to be able to reserve a worker exclusively not only for the execution of a single job but a whole batch of jobs. This should not be that hard to implement on the worker side:

  • At least on the worker-side it merely means that the worker must be capable of accepting a list of jobs at a time (instead of only a single job). The worker would then simply run those jobs in the specified order until all jobs are done or one job fails. It would not accept any other jobs until that happens (similar to its current behavior where it does not accept another job while already busy with one).
  • When the worker accepts the list of jobs it would obviously accept all of these jobs at the same time. Storing not a single job but a list of jobs is not a big deal with the new worker structure.
  • The accepted jobs which are not immediately processed (basically all jobs but the first) should be set to a different state, e.g. 'enqueued'. Setting the job state directly to 'running' would be confusing because one might think the jobs are executed concurrently.
  • This seems the most intuitive and easiest way to implement this. I see no interference with the shared worker feature.

I'm not that familiar with the web UI and scheduler side of this so I can not tell any impediments on that side. The the required changes would be:

  • Take the FOLLOW_TEST_DIRECTLY relation into account. That means we can get job batches.
  • Assign all jobs in a job batch at the same time. So set_assigned_worker/set_scheduling_worker would need to work on a job batch rather than only a single job.
  • That also means that the job-worker relationship in the database must change. A worker can now have multiple jobs assigned at the same time. That kind of relationship is actually already present in the database schema. It is used for the job history of a worker which I implemented some time ago. We should be able to use it for the scheduling as well. It is still an intrusive change which likely requires to adapt a lot of test code, too.
  • The websocket server needs to send multiple jobs at a time. So send_job and ws_send_job would become plural.

#22 Updated by pvorel about 2 years ago

mkittler thanks a lot for considering to implement it :). Are LTP tests the only one needed that? If yes we should really think about workaround. We already have some plans how to merge install_ltp into each LTP test, so this shouldn't be needed.

#23 Updated by mkittler about 2 years ago

I don't know who needs this.

Note that implementing the point "That also means that the job-worker relationship in the database must change." would also have more positive side-effects. In particular, it would help to improve the stale job detection. So far, when a worker re-registers the job it was supposed to do is marked as incomplete (unless the worker claims that it is actually still working on that job). This would work more reliably with that database change because then the full job history of that worker can be considered.

#24 Updated by okurz about 2 years ago

Certainly QSK&N is not the only team needing that. Everyone caring about tests on bare-metal or self-provisioning backends would eventually want a feature like this. mkittler I suggest to rewrite the ticket description into https://progress.opensuse.org/projects/openqav3/wiki#Feature-requests first

#25 Updated by mkittler about 2 years ago

okurz I've just created a new ticket: https://progress.opensuse.org/issues/55835

I added my implementation idea as "tasks".

#26 Updated by mkittler about 2 years ago

  • Subject changed from Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY). to Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY aka START_DIRECTLY_AFTER_TEST).
  • Assignee set to mkittler

#27 Updated by mkittler about 2 years ago

What should happen if one creates a dependency tree like this (with FOLLOW_TEST_DIRECTLY/START_DIRECTLY_AFTER_TEST)?

   --> B --> C
 /
A
 \ 
   --> D --> E

The particularity I'd like to discuss is handling nodes (like A) with multiple chained children (like B and D).

I would say the sub trees B --> C and D --> E might be executed in random order. So the actual execution order might be either A --> B --> C --> D --> E or A --> D --> E --> B --> C. If only one of those chains is actually valid one simply can define that tree in the first place.

If A fails B, C, D and E are not attempted to be executed. Let's assume the B --> C sub tree is executed first. If B fails C is not attempted to be executed. However, the D --> E chain is still attempted to be executed. That might not work because B might have left the system in a bad state but I guess one can at least try. Maybe a post fail hook in B could also try to clean up the mess (e.g. killing all applications attempted to be started) to make it more likely subsequent tests will pass.

So that is my proposal how I'd intuitively had done it. That way it would work like regularly chained dependencies (at least I think they work this way in openQA). The main difference is of course that the B --> C and D --> E sub trees will never be executed in parallel.

#28 Updated by okurz about 2 years ago

how about simply not allowing any multiple occurences, i.e. only a linear schedule, and fail otherwise? one can still think about combining the usual START_AFTER with the new field, or not?

#29 Updated by mkittler about 2 years ago

how about simply not allowing any multiple occurences

Note that this is not about cutting corners to make my task for implementing this easier. I really ask for how this is expected to behave. Only allowing a linear schedule sounds very restrictive. One could even achieve the same by simply adding all test modules inside a single job (admittedly this will not scale very well). Besides, I don't like to take extra steps for disallowing this configuration just to change things again if it is required after all.

one can still think about combining the usual START_AFTER with the new field, or not?

Why would that be possible? That would change the meaning. E.g. if the A --> B and A --> D edges in my example were changed to be regularly chained A, the B --> C sub chain and the D --> E sub chain might be executed on different workers (and B --> C and D --> E possibly in parallel).

At least that is how I would have implemented directly chained dependencies. They only form a big "cluster" when not interrupted by a regularly chained dependency. Otherwise there can be sub clusters.

#30 Updated by okurz about 2 years ago

mkittler wrote:

how about simply not allowing any multiple occurences

Note that this is not about cutting corners to make my task for implementing this easier. I really ask for how this is expected to behave. Only allowing a linear schedule sounds very restrictive. One could even achieve the same by simply adding all test modules inside a single job (admittedly this will not scale very well). Besides, I don't like to take extra steps for disallowing this configuration just to change things again if it is required after all.

Well, I thought aborting early with an explicit error message is both helpful for users as well as for you implementing it in a fast manner regardless if we find the need/time to change it later. I did not think you wanted to cut corners but it's actually a suggestion from me how to get the task done most easily for now.

#31 Updated by coolo about 2 years ago

I don't see any alternative to what Olli said - if you run B -> C first, then D is no longer directly after A.

And it will surely be fun for the kernel team to calculcate an order of their tests to run in.

#32 Updated by pvorel about 2 years ago

Our user case for LTP is many B-Z (96 jobs actually) depend on A (single job called install_ltp). Failure of any of B-Z should not affect the others, failure of A should cancel all B-Z.

#33 Updated by coolo about 2 years ago

That's not FOLLOW_DIRECTLY, sorry. How exactly do you imagine one LTP test case leaving the machine in kernel panic useful for the followup?

#34 Updated by mkittler about 2 years ago

I don't see any alternative to what Olli said - if you run B -> C first, then D is no longer directly after A.

That is right. But D will still be executed on the same worker as A. So that half of what makes the new dependency special would still be provided.

Our user case for LTP is many B-Z (96 jobs actually) depend on A (single job called install_ltp). Failure of any of B-Z should not affect the others, failure of A should cancel all B-Z.

I have expected this and hence came up with my suggestion. (The suggested behavior would allow exactly this.) But maybe "FOLLOW_TEST_DIRECTLY" is misleading then. It is more like "no test from a different direct-cluster can sneak in" and "guaranteed to run on the same worker".

How exactly do you imagine one LTP test case leaving the machine in kernel panic useful for the followup?

Likely the average failure is less severe and one can just continue by killing everything which is possibly still running in the post fail hook.

#35 Updated by pvorel about 2 years ago

coolo wrote:

That's not FOLLOW_DIRECTLY, sorry. How exactly do you imagine one LTP test case leaving the machine in kernel panic useful for the followup?

That was in the original description of this feature request :) ("Our user case for LTP is many B-Z (96 jobs actually) depend on A (single job called install_ltp). Failure of any of B-Z should not affect the others, failure of A should cancel all B-Z.")

Regard to kernel panic: there should be reboot between tests.

#36 Updated by coolo about 2 years ago

OK, so let's do as Marius suggested. With one change: don't make this a random order, but order by test suite name. So always A -> B -> C -> D -> E - avoids suprises.

#37 Updated by mkittler about 2 years ago

  • Target version changed from Ready to Current Sprint

#38 Updated by mkittler about 2 years ago

This seems to work now basically using the approach mentioned earlier: https://github.com/os-autoinst/openQA/pull/2309

"Basically" means that further testing is required. So far I only tested it locally with 2 jobs. However, more complicated dependency trees are covered by the unit tests.

I only deviated slightly from the original idea:

  1. The worker-to-job relation actually stays. Although a worker can now have multiple jobs assigned at the same time it is still useful to track the currently running job of a worker. The way these relations are used and updated still has to change in some places of course.
  2. "The worker would then simply run those jobs in the specified order until all jobs are done or one job fails." - We changed that to a more sophisticated model to cover "Our user case for LTP is many B-Z (96 jobs actually) depend on A (single job called install_ltp). Failure of any of B-Z should not affect the others, failure of A should cancel all B-Z.".
  3. It was not technically necessary to introduce an extra "enqueued" job state because having them "assigned" seems to be enough. Maybe I can still improve how it is displayed, though. (See screenshots in PR.)

Note that when assigning multiple directly chained children to a job (for handling failures like in 2.) the dependency graph will show the directly chained dependencies in form of the usual tree (direct and regularly chained dependencies are distinguishable via tool tips):

   --> B --> C
 /
A
 \ 
   --> D --> E

Better would be to show them in the actual execution sequence and to highlight the groups (and sub groups) of jobs which need to be skipped altogether if one of them fails:

A -> [ B -> C ] -> [ D -> E ]

However, implementing that in the graph would be quite some effort which I would postpone until we know the feature is actually as useful as we think.

#39 Updated by mkittler about 2 years ago

  • Status changed from New to In Progress

#40 Updated by mkittler about 2 years ago

This is how it looks like now:

screenshot_20190916_124113

  1. The execution order of multiple directly chained children within the same parent is determined by the TEST setting of the jobs. So the order here is always "...-parent", "...-01-child", "...-02-child", "...-02-child-01-..." and "...-03-child".
  2. You can also see the limitation of the graph I was talking about in the last comment.
  3. Cancelling "directly-chained-01-child" would so far cause its siblings "directly-chained-02-child" and "directly-chained-03-child" to be cancelled as well. (And their children so basically the entire cluster would be cancelled.)

I'm wondering whether 3. is the expected behavior. This is just what one gets by treating directly chained dependencies like regularly chained dependencies in that regard but I could try to change that. However, considering that your use case involves 96 jobs you likely even want to be able to cancel them altogether, right? So I'll keep it unless someone tells me that a different behavior is required.

#41 Updated by mkittler about 2 years ago

  • Status changed from In Progress to Feedback

The PR has been merged and deployed on OSD.

Documentation has been added as well. (See https://github.com/os-autoinst/openQA/blob/master/docs/WritingTests.asciidoc#job-dependencies until http://open.qa/docs is updated.)

#42 Updated by mkittler almost 2 years ago

  • Target version changed from Current Sprint to Done

I can not set the ticket to resolved. I'm removing it at least from the current spring for now.

#43 Updated by okurz almost 2 years ago

It can't be set to "Resolved" because it's "blocked by" #47117 so either you change that relationship, e.g. remove and readd as just "related", or you make sure the blocking ticket is resolved first.

#44 Updated by okurz almost 2 years ago

As currently no test suite on osd is using the new dependency and I understood from QA SLE Kernel & Network that it does not fulfill their requirements I wonder what we should do in the next step.

#45 Updated by coolo almost 2 years ago

I understood from QA SLE Kernel & Network that it does not fulfill their requirements

Where is that from?

#46 Updated by okurz almost 2 years ago

coolo wrote:

I understood from QA SLE Kernel & Network that it does not fulfill their requirements

That's what I understood from #53948#note-12 that's why they seem to be eager to get https://github.com/os-autoinst/os-autoinst/pull/1208 and then https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329

#47 Updated by mkittler almost 2 years ago

  • Blocked by deleted (coordination #47117: [epic] Fix worker->websocket->scheduler->webui connection)

#48 Updated by mkittler almost 2 years ago

I deleted the 'blocked by' relation. This is not blocked by the entire epic.

#49 Updated by pvorel almost 2 years ago

okurz wrote:

coolo wrote:

I understood from QA SLE Kernel & Network that it does not fulfill their requirements

That's what I understood from #53948#note-12 that's why they seem to be eager to get https://github.com/os-autoinst/os-autoinst/pull/1208 and then https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329

I started to use this feature yesterday for LTP bare metal. If it fulfill our needs we stay with it for bare metal. If not, we move to mdoucha changes (installing LTP for each job, combined with detecting whether LTP is installed and skip it would be probably the most safest solution)

mdoucha changes will be probably used for o3, where tests are failing due broken snapshot. And he started to developing it for some of his needs as well.

But IMHO it's useful feature and hope this solution will be used by others as Oliver claimed (#41066#note-24).

#50 Updated by mkittler almost 2 years ago

  • Status changed from Feedback to Resolved

#51 Updated by pvorel almost 2 years ago

mkittler: Thanks for implementing this. It's working nice, simplified test setup and allowed to use machines more effectively :).

Also available in: Atom PDF