Project

General

Profile

action #119380

Pause on module failure in developer mode size:M

Added by marmarek 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-10-25
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

When developing new test it's developer mode is very helpful. When needle update is needed, it can pause on mismatch and let you fix that and continue the test. But there is no similar option for assert_script_run failure, which even with manual single-step requires restarting the whole test on failure. This is especially frustrating when all that's needed is adjusted timeout (but the actual command worked).

What I'd like, is an option to pause on failed assert_script_run. And then have at least an option to continue (ignoring the failure), or optionally retry the command (and pause again if it fails again).
If retrying isn't an option, just pause and continue would be really useful, as one can inject commands via VNC manually.

Acceptance criteria

  • AC1: Developer mode can stop on any individual module failure
  • AC2: marmarek is happy
  • AC3: Stopping on needle mismatches still works

Suggestions

  • This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.

History

#1 Updated by favogt 3 months ago

IMO a generic "pause on module failure" option might be useful here and get to at least 80% of the goal

For needles, retrying the match makes sense as it's possible to create new needles and using dev mode + VNC connection also possible to change the system state. Retrying an assert_script_run however does not allow to change the command or expected outcome, just the system state. In my experience, the main reason of script failures is that the command itself is incorrect, which can't be fixed by simply retrying.

This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.

#2 Updated by marmarek 3 months ago

favogt wrote:

IMO a generic "pause on module failure" option might be useful here and get to at least 80% of the goal

Yes, as long as there will be an option to continue, instead of following "normal" failure action (post_fail_hook, reverting to snapshot, aborting all together etc).

This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.

Having primitive like this, I could also extend distribution script_run (or other commands) to add pause points.

#3 Updated by okurz 3 months ago

  • Priority changed from Normal to Low
  • Target version set to Ready

#4 Updated by mkittler 3 months ago

I concur with @fvogt, a generic "pause on module failure" option makes most sense. We don't want to have specific handling for several reasons the test can fail in the future. Of course when continuing there should be an option to ignore the failure or to re-do the failing command.

This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.

Out of my head I'd say that sounds like a good way of implementing it, indeed. Since okurz has added it to the backlog I'd like to work on it next week (unless someone else grabs the ticket of course).

#5 Updated by mkittler 3 months ago

  • Assignee set to mkittler

#6 Updated by cdywan 3 months ago

  • Subject changed from Pause on assert_script_run failure in developer mode to Pause on module failure in developer mode size:M
  • Description updated (diff)
  • Status changed from New to Workable

#7 Updated by mkittler 3 months ago

  • Status changed from Workable to In Progress
  • Draft PR: https://github.com/os-autoinst/os-autoinst/pull/2198
  • Being able to resume and continue where the test execution has left off is a bit of a problem. When an exception has been thrown the control flow is intercepted and there's no way to jump back (or is there a Perl hack I'm unaware of?). So if one resumes after an arbitrary unhandled exception the execution will simply continue on the next test module.
  • In case the test fails via a failing check using a test API function it would be possible to avoid throwing the exception. This likely means adjusting many test API functions but it should be doable (as shown in the draft PR for assert_script_run).

#8 Updated by marmarek 3 months ago

Thanks for working on this!

mkittler wrote:

  • Being able to resume and continue where the test execution has left off is a bit of a problem. When an exception has been thrown the control flow is intercepted and there's no way to jump back (or is there a Perl hack I'm unaware of?). So if one resumes after an arbitrary unhandled exception the execution will simply continue on the next test module.
  • In case the test fails via a failing check using a test API function it would be possible to avoid throwing the exception. This likely means adjusting many test API functions but it should be doable (as shown in the draft PR for assert_script_run).

I think this is perfectly fine - do the pause in modified test functions. In some of them it wouldn't really make sense, for example continuing from script_output with no result wouldn't work in most cases anyway.
For me personally, just modifying assert_script_run, and providing alternative "test fail" function (to use instead of die when continuation make sense) would be enough.

Draft PR: https://github.com/os-autoinst/os-autoinst/pull/2198

How I can test this? Is there related frontend change? Especially, how to choose whether to ignore the failure or not from developer mode? IIUC I can enable the feature with PAUSE_ON_FAILURE=1, but can't choose how to continue.

#9 Updated by mkittler 3 months ago

I'm working on this low prio ticket on the side. So I'm not stuck. At this point there's nothing for you to test because I'd like to test it on my own a little bit more first.

However, I could already come up with a few improvements for the developer mode on the way: https://github.com/os-autoinst/os-autoinst/pull/2204, https://github.com/os-autoinst/openQA/pull/4897

#10 Updated by mkittler 3 months ago

#11 Updated by mkittler 2 months ago

  • Status changed from In Progress to Feedback

Those two PRs have been merged and they supposedly didn't cause any regressions.

I've been creating two more PRs to be able to resume immediately after assert_script_run and have a more specific reason displayed in the developer mode panel:

#12 Updated by marmarek 2 months ago

Thanks! I tested it both for ordinary "pause on failure", as well as pausing on assert_script_run failure and ignoring that failure (with both above PRs included for that) and it works great.

AC2: marmarek is happy

I'm happy :)

#13 Updated by mkittler 2 months ago

Great :-)

#14 Updated by okurz 2 months ago

Cool, so everybody is happy :)

Waiting for outstanding PR https://github.com/os-autoinst/os-autoinst/pull/2225

#15 Updated by mkittler 2 months ago

  • Status changed from Feedback to Resolved

The pending PR has been merged. I think this is good enough for now. Of course one could implement continuing directly after other functions than assert_script_run but I think assert_script_run is the most important one for now (as this was also our initial use-case). So I'd consider it resolved with that.

Also available in: Atom PDF