action #119380: Pause on module failure in developer mode size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #119380

closed

Pause on module failure in developer mode size:M

Added by marmarek over 2 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Low

Assignee:

mkittler

Category:

Feature requests

Target version:

Ready

Start date:

2022-10-25

Due date:

% Done:

Estimated time:

Description

Observation¶

When developing new test it's developer mode is very helpful. When needle update is needed, it can pause on mismatch and let you fix that and continue the test. But there is no similar option for assert_script_run failure, which even with manual single-step requires restarting the whole test on failure. This is especially frustrating when all that's needed is adjusted timeout (but the actual command worked).

What I'd like, is an option to pause on failed assert_script_run. And then have at least an option to continue (ignoring the failure), or optionally retry the command (and pause again if it fails again).
If retrying isn't an option, just pause and continue would be really useful, as one can inject commands via VNC manually.

Acceptance criteria¶

AC1: Developer mode can stop on any individual module failure
AC2: marmarek is happy
AC3: Stopping on needle mismatches still works

Suggestions¶

This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.

Actions

Copy link

Updated by favogt over 2 years ago

IMO a generic "pause on module failure" option might be useful here and get to at least 80% of the goal

For needles, retrying the match makes sense as it's possible to create new needles and using dev mode + VNC connection also possible to change the system state. Retrying an assert_script_run however does not allow to change the command or expected outcome, just the system state. In my experience, the main reason of script failures is that the command itself is incorrect, which can't be fixed by simply retrying.

This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.

Actions

Copy link

Updated by marmarek over 2 years ago

favogt wrote:

IMO a generic "pause on module failure" option might be useful here and get to at least 80% of the goal

Yes, as long as there will be an option to continue, instead of following "normal" failure action (post_fail_hook, reverting to snapshot, aborting all together etc).

This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.

Having primitive like this, I could also extend distribution script_run (or other commands) to add pause points.

Actions

Copy link

Updated by okurz over 2 years ago

Priority changed from Normal to Low
Target version set to Ready

Actions

Copy link

Updated by mkittler over 2 years ago

I concur with @fvogt, a generic "pause on module failure" option makes most sense. We don't want to have specific handling for several reasons the test can fail in the future. Of course when continuing there should be an option to ignore the failure or to re-do the failing command.

This new option could be implemented by creating a new lower level pause($reason) backend command and just send that on module failure (but before the post_fail_hook) if "pause on module failure" is enabled, or a module_failure($module) to have the backend decide what to do.

Out of my head I'd say that sounds like a good way of implementing it, indeed. Since @okurz has added it to the backlog I'd like to work on it next week (unless someone else grabs the ticket of course).

Actions

Copy link

Updated by mkittler over 2 years ago

Assignee set to mkittler

Actions

Copy link

Updated by livdywan over 2 years ago

Subject changed from Pause on assert_script_run failure in developer mode to Pause on module failure in developer mode size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by mkittler over 2 years ago

Status changed from Workable to In Progress

Draft PR: https://github.com/os-autoinst/os-autoinst/pull/2198
Being able to resume and continue where the test execution has left off is a bit of a problem. When an exception has been thrown the control flow is intercepted and there's no way to jump back (or is there a Perl hack I'm unaware of?). So if one resumes after an arbitrary unhandled exception the execution will simply continue on the next test module.
In case the test fails via a failing check using a test API function it would be possible to avoid throwing the exception. This likely means adjusting many test API functions but it should be doable (as shown in the draft PR for assert_script_run).

Actions

Copy link

Updated by marmarek over 2 years ago

Thanks for working on this!

mkittler wrote:

Being able to resume and continue where the test execution has left off is a bit of a problem. When an exception has been thrown the control flow is intercepted and there's no way to jump back (or is there a Perl hack I'm unaware of?). So if one resumes after an arbitrary unhandled exception the execution will simply continue on the next test module.

In case the test fails via a failing check using a test API function it would be possible to avoid throwing the exception. This likely means adjusting many test API functions but it should be doable (as shown in the draft PR for assert_script_run).

I think this is perfectly fine - do the pause in modified test functions. In some of them it wouldn't really make sense, for example continuing from script_output with no result wouldn't work in most cases anyway.
For me personally, just modifying assert_script_run, and providing alternative "test fail" function (to use instead of die when continuation make sense) would be enough.

Draft PR: https://github.com/os-autoinst/os-autoinst/pull/2198

How I can test this? Is there related frontend change? Especially, how to choose whether to ignore the failure or not from developer mode? IIUC I can enable the feature with PAUSE_ON_FAILURE=1, but can't choose how to continue.

Actions

Copy link

Updated by mkittler over 2 years ago

I'm working on this low prio ticket on the side. So I'm not stuck. At this point there's nothing for you to test because I'd like to test it on my own a little bit more first.

However, I could already come up with a few improvements for the developer mode on the way: https://github.com/os-autoinst/os-autoinst/pull/2204, https://github.com/os-autoinst/openQA/pull/4897

Actions

Copy link

#10

Updated by mkittler over 2 years ago

I've just updated the PR and also created one for UI changed in openQA: https://github.com/os-autoinst/openQA/pull/4906
For the question about testing, see my comment on GitHub: https://github.com/os-autoinst/os-autoinst/pull/2198#issuecomment-1318788214

Actions

Copy link

#11

Updated by mkittler over 2 years ago

Status changed from In Progress to Feedback

Those two PRs have been merged and they supposedly didn't cause any regressions.

I've been creating two more PRs to be able to resume immediately after assert_script_run and have a more specific reason displayed in the developer mode panel:

Actions

Copy link

#12

Updated by marmarek over 2 years ago

Thanks! I tested it both for ordinary "pause on failure", as well as pausing on assert_script_run failure and ignoring that failure (with both above PRs included for that) and it works great.

AC2: marmarek is happy

I'm happy :)

Actions

Copy link

#13

Updated by mkittler over 2 years ago

Great :-)

Actions

Copy link

#14

Updated by okurz over 2 years ago

Cool, so everybody is happy :)

Waiting for outstanding PR https://github.com/os-autoinst/os-autoinst/pull/2225

Actions

Copy link

#15

Updated by mkittler over 2 years ago

Status changed from Feedback to Resolved

The pending PR has been merged. I think this is good enough for now. Of course one could implement continuing directly after other functions than assert_script_run but I think assert_script_run is the most important one for now (as this was also our initial use-case). So I'd consider it resolved with that.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #119380

Pause on module failure in developer mode size:M

Observation¶

Acceptance criteria¶

Suggestions¶

Updated by favogt over 2 years ago

Updated by marmarek over 2 years ago

Updated by okurz over 2 years ago

Updated by mkittler over 2 years ago

Updated by mkittler over 2 years ago

Updated by livdywan over 2 years ago

Updated by mkittler over 2 years ago

Updated by marmarek over 2 years ago

Updated by mkittler over 2 years ago

Updated by mkittler over 2 years ago

Updated by mkittler over 2 years ago

Updated by marmarek over 2 years ago

Updated by mkittler over 2 years ago

Updated by okurz over 2 years ago

Updated by mkittler over 2 years ago