action #38510: Allow os-autoinst to pause on next assert_screen timeout - openQA Project - openSUSE Project Management Tool

Actions

Copy link

action #38510

closed

coordination #14818: [EPIC] Interactive mode is an usability disaster

Allow os-autoinst to pause on next assert_screen timeout

Added by mkittler almost 6 years ago. Updated almost 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

mkittler

Category:

Target version:

Done

Start date:

2018-07-18

Due date:

% Done:

Estimated time:

Description

User story¶

Ray reviews the latest build of Tumbleweed and notices a font change breaking all old needles in GUI applications. He wants openQA to pause whenever an assert_screen would timeout to update the needle and redo the assert_screen with that new needle. Ray is working at about 3 scenarios at the same time - covering different areas of needles.

(2nd user story from https://progress.opensuse.org/issues/14818#note-18)

Tasks¶

Allow os-autoinst to pause on next assert_screen timeout.
Ensure that on resume needles are reloaded and the assert_screen is executed again.
Add checkbox in the developer mode UI to enable/disable this conveniently.
Allow to open needle editor conveniently from the developer mode.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by mkittler almost 6 years ago

Description updated (diff)

Actions

Copy link

Updated by mkittler almost 6 years ago

Status changed from New to In Progress

PRs:

Seems to work roughly but still needs further testing and likely bug fixing. Extending the test-suite to cover this will be the challenge.

Actions

Copy link

Updated by mkittler almost 6 years ago

Problems I've noticed so far:

I tried to make it upload the screenshots for the current module inside Worker::Jobs::upload_status. This is required if one wants to open the needle editor to create a new needle when the assertion failed. However, my adjustments to upload_status don't work very well because during the pause at a certain module now two modules appear as running at the same time. I'll have to find out how to adjust upload_status to do the right thing.
So far os-autoinst pauses when a timeout occurs for any reason. However, it should only pause for assert_screens and maybe (optionally) for check_screens. There are also other functions like send_key_until_needlematch which call check_screen and assert_screen internally. Not sure whether those should be ignored, too.

Log messages from worker and web UI which are related to 1.:

can't open /tmp/wUm1C2CH2e/t/full-stack.d/openqa/pool/1/testresults/result-shutdown.json: No such file or directory at /opt/testing_area/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 828.
Odd number of elements in anonymous hash at /opt/testing_area/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 704.
DBIx::Class::Storage::DBI::select_single(): Query returned more than one row.  SQL that returns multiple rows is DEPRECATED for ->find and ->single at /hdd/openqa-devel/repos/openQA/script/../lib/OpenQA/WebAPI/Controller/Running.pm line 54

(also visible in Travis log of WIP PR)

Actions

Copy link

Updated by mkittler almost 6 years ago

Current state:

See PRs mentioned in previous comment
What works:
- setting/unsetting whether to pause on assert_screen failure via UI
- pausing on assert_screen failure if enabled
- failing as usual if pausing on assert_screen is not enabled
- setting/unsetting whether to pause on assert_screen failure when already paused: test behaves as expected on resume
- opening needle editor when paused and create a new needle
- resume with new needle and successfully assert it
- the 2nd point mentioned in the previous comment should be fixed now - it will now only pause on assert_screens but not on check_screens
- the needle editor suggest to return to live view and not to restart the job if a development session for the job is present
What doesn't work:
- The 1st point mentioned in the last comment is still TODO. This problem is also causing the tests to fail.
- At some point the browser window seems to frees. Maybe this only happens when the JavaScript debugger is used.
- Failed tags are automatically added in needle editor (required?).
Tests I've done far:
- Add UI-only tests to t/ui/25-developer-mode.t
- Add unit tests for assert_screen and check_screen to t/03-testapi.t
- Otherwise only manual testing with 'opensuse-Staging:B-Staging-DVD-x86_64-Build1190.2-kde@64bit'
  - I adjusted it to fail very soon: https://github.com/Martchus/os-autoinst-distri-opensuse/commit/let_bootloader_fail
  - Needles created with needle editor when paused: https://github.com/Martchus/os-autoinst-needles-opensuse/commit/let_bootloader_fail
- So t/33-developer-mode.t still needs to be extended.
  - To get started, I would just rename some needle (eg. boot-on_prompt.json) so it can not be found anymore. If the test pauses, I would rename it and check whether the test continues on resume. It is likely the easiest to just use the devel console for this (instead of dealing with the UI elements). DONE
  - Then I'd continue with further things like opening the needle editor when paused.

Actions

Copy link

Updated by mkittler almost 6 years ago

Making the needle editor work is tricky. It basically works now but when adding a test for it (https://github.com/Martchus/openQA/commit/pause_on_assert_screen_timeout_upload_event) I just get 404. When trying it again with a short delay it works. So the reason is that the required results haven't been uploaded quickly enough. During manual testing I was always slow enough not to run into this issue.

I remember that we had a similar problem with the old interactive mode (screenshot was sometimes not available when opening the needle editor).

To fix this we likely have to pass the progress of the result upload to the client. Then it can wait till the upload has been completed before offering to open the needle editor. This could be done using the web socket route.

Can you give me any feedback about this so far? I'll deploy this on a staging instance so you can try yourself.

Actions

Copy link

Updated by mkittler almost 6 years ago

I improved the 404 error pages shown when opening the needle editor for a running test. I think that's a cheap workaround for the problem mentioned in the previous comment and helpful anyways.

Actions

Copy link

Updated by coolo almost 6 years ago

IMO the client needs to be certain the uploads are in before opening a link to the needle editor.

Actions

Copy link

Updated by mkittler almost 6 years ago

My ideas how to implement that:

The worker could pass that info to the os-autoinst command server which could forward it to it the liveview handler which could forward it to the browser. That way the info is forwarded twice which is not so nice. Also, replaying the info for a newly joined client would be more effort compared to 2.
The worker could pass that info to the web UI using a new route provided by the livehandler daemon. The last part is important because only that way the info can be forwarded to the connected web browser clients via web sockets. The livehandler could also store the info in the database so it can be easily included to the status information every newly joining client receives at the beginning.

I think 2. is the better option so I'm going to implement it if there are no objections :-)

Actions

Copy link