action #119380
closedPause on module failure in developer mode size:M
Description
Observation¶
When developing new test it's developer mode is very helpful. When needle update is needed, it can pause on mismatch and let you fix that and continue the test. But there is no similar option for assert_script_run
failure, which even with manual single-step requires restarting the whole test on failure. This is especially frustrating when all that's needed is adjusted timeout (but the actual command worked).
What I'd like, is an option to pause on failed assert_script_run
. And then have at least an option to continue (ignoring the failure), or optionally retry the command (and pause again if it fails again).
If retrying isn't an option, just pause and continue would be really useful, as one can inject commands via VNC manually.
Acceptance criteria¶
- AC1: Developer mode can stop on any individual module failure
- AC2: marmarek is happy
- AC3: Stopping on needle mismatches still works
Suggestions¶
- This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.
Updated by favogt almost 2 years ago
IMO a generic "pause on module failure" option might be useful here and get to at least 80% of the goal
For needles, retrying the match makes sense as it's possible to create new needles and using dev mode + VNC connection also possible to change the system state. Retrying an assert_script_run
however does not allow to change the command or expected outcome, just the system state. In my experience, the main reason of script failures is that the command itself is incorrect, which can't be fixed by simply retrying.
This new option could be implemented by creating a new lower level pause($reason)
backend command and just send that on module failure (but before the post_fail_hook
) if "pause on module failure" is enabled, or a module_failure($module)
to have the backend decide what to do.
Updated by marmarek almost 2 years ago
favogt wrote:
IMO a generic "pause on module failure" option might be useful here and get to at least 80% of the goal
Yes, as long as there will be an option to continue, instead of following "normal" failure action (post_fail_hook, reverting to snapshot, aborting all together etc).
This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.
Having primitive like this, I could also extend distribution script_run (or other commands) to add pause points.
Updated by okurz almost 2 years ago
- Priority changed from Normal to Low
- Target version set to Ready
Updated by mkittler almost 2 years ago
I concur with @fvogt, a generic "pause on module failure" option makes most sense. We don't want to have specific handling for several reasons the test can fail in the future. Of course when continuing there should be an option to ignore the failure or to re-do the failing command.
This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.
Out of my head I'd say that sounds like a good way of implementing it, indeed. Since @okurz has added it to the backlog I'd like to work on it next week (unless someone else grabs the ticket of course).
Updated by livdywan almost 2 years ago
- Subject changed from Pause on assert_script_run failure in developer mode to Pause on module failure in developer mode size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler almost 2 years ago
- Status changed from Workable to In Progress
- Draft PR: https://github.com/os-autoinst/os-autoinst/pull/2198
- Being able to resume and continue where the test execution has left off is a bit of a problem. When an exception has been thrown the control flow is intercepted and there's no way to jump back (or is there a Perl hack I'm unaware of?). So if one resumes after an arbitrary unhandled exception the execution will simply continue on the next test module.
- In case the test fails via a failing check using a test API function it would be possible to avoid throwing the exception. This likely means adjusting many test API functions but it should be doable (as shown in the draft PR for
assert_script_run
).
Updated by marmarek almost 2 years ago
Thanks for working on this!
mkittler wrote:
- Being able to resume and continue where the test execution has left off is a bit of a problem. When an exception has been thrown the control flow is intercepted and there's no way to jump back (or is there a Perl hack I'm unaware of?). So if one resumes after an arbitrary unhandled exception the execution will simply continue on the next test module.
- In case the test fails via a failing check using a test API function it would be possible to avoid throwing the exception. This likely means adjusting many test API functions but it should be doable (as shown in the draft PR for
assert_script_run
).
I think this is perfectly fine - do the pause in modified test functions. In some of them it wouldn't really make sense, for example continuing from script_output
with no result wouldn't work in most cases anyway.
For me personally, just modifying assert_script_run
, and providing alternative "test fail" function (to use instead of die
when continuation make sense) would be enough.
Draft PR: https://github.com/os-autoinst/os-autoinst/pull/2198
How I can test this? Is there related frontend change? Especially, how to choose whether to ignore the failure or not from developer mode? IIUC I can enable the feature with PAUSE_ON_FAILURE=1
, but can't choose how to continue.
Updated by mkittler almost 2 years ago
I'm working on this low prio ticket on the side. So I'm not stuck. At this point there's nothing for you to test because I'd like to test it on my own a little bit more first.
However, I could already come up with a few improvements for the developer mode on the way: https://github.com/os-autoinst/os-autoinst/pull/2204, https://github.com/os-autoinst/openQA/pull/4897
Updated by mkittler almost 2 years ago
- I've just updated the PR and also created one for UI changed in openQA: https://github.com/os-autoinst/openQA/pull/4906
- For the question about testing, see my comment on GitHub: https://github.com/os-autoinst/os-autoinst/pull/2198#issuecomment-1318788214
Updated by mkittler almost 2 years ago
- Status changed from In Progress to Feedback
Those two PRs have been merged and they supposedly didn't cause any regressions.
I've been creating two more PRs to be able to resume immediately after assert_script_run
and have a more specific reason displayed in the developer mode panel:
Updated by marmarek almost 2 years ago
Thanks! I tested it both for ordinary "pause on failure", as well as pausing on assert_script_run
failure and ignoring that failure (with both above PRs included for that) and it works great.
AC2: marmarek is happy
I'm happy :)
Updated by okurz almost 2 years ago
Cool, so everybody is happy :)
Waiting for outstanding PR https://github.com/os-autoinst/os-autoinst/pull/2225
Updated by mkittler almost 2 years ago
- Status changed from Feedback to Resolved
The pending PR has been merged. I think this is good enough for now. Of course one could implement continuing directly after other functions than assert_script_run
but I think assert_script_run
is the most important one for now (as this was also our initial use-case). So I'd consider it resolved with that.