action #38510

action #14818: [EPIC] Interactive mode is an usability disaster

Allow os-autoinst to pause on next assert_screen timeout

Added by mkittler over 1 year ago. Updated over 1 year ago.

Status:ResolvedStart date:18/07/2018
Priority:NormalDue date:
Assignee:mkittler% Done:

0%

Category:-
Target version:Done
Difficulty:
Duration:

Description

User story

Ray reviews the latest build of Tumbleweed and notices a font change breaking all old needles in GUI applications. He wants openQA to pause whenever an assert_screen would timeout to update the needle and redo the assert_screen with that new needle. Ray is working at about 3 scenarios at the same time - covering different areas of needles.

(2nd user story from https://progress.opensuse.org/issues/14818#note-18)

Tasks

  1. Allow os-autoinst to pause on next assert_screen timeout.
  2. Ensure that on resume needles are reloaded and the assert_screen is executed again.
  3. Add checkbox in the developer mode UI to enable/disable this conveniently.
  4. Allow to open needle editor conveniently from the developer mode.

Related issues

Related to openQA Project - action #41015: Don't use livehandler if no developer looks at it Resolved 14/09/2018

History

#1 Updated by mkittler over 1 year ago

  • Description updated (diff)

#2 Updated by mkittler over 1 year ago

  • Status changed from New to In Progress

PRs:


Seems to work roughly but still needs further testing and likely bug fixing. Extending the test-suite to cover this will be the challenge.

#3 Updated by mkittler over 1 year ago

Problems I've noticed so far:

  1. I tried to make it upload the screenshots for the current module inside Worker::Jobs::upload_status. This is required if one wants to open the needle editor to create a new needle when the assertion failed. However, my adjustments to upload_status don't work very well because during the pause at a certain module now two modules appear as running at the same time. I'll have to find out how to adjust upload_status to do the right thing.
  2. So far os-autoinst pauses when a timeout occurs for any reason. However, it should only pause for assert_screens and maybe (optionally) for check_screens. There are also other functions like send_key_until_needlematch which call check_screen and assert_screen internally. Not sure whether those should be ignored, too.

Log messages from worker and web UI which are related to 1.:

can't open /tmp/wUm1C2CH2e/t/full-stack.d/openqa/pool/1/testresults/result-shutdown.json: No such file or directory at /opt/testing_area/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 828.
Odd number of elements in anonymous hash at /opt/testing_area/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 704.
DBIx::Class::Storage::DBI::select_single(): Query returned more than one row.  SQL that returns multiple rows is DEPRECATED for ->find and ->single at /hdd/openqa-devel/repos/openQA/script/../lib/OpenQA/WebAPI/Controller/Running.pm line 54

(also visible in Travis log of WIP PR)

#4 Updated by mkittler over 1 year ago

Current state:

  • See PRs mentioned in previous comment
  • What works:
    • setting/unsetting whether to pause on assert_screen failure via UI
    • pausing on assert_screen failure if enabled
    • failing as usual if pausing on assert_screen is not enabled
    • setting/unsetting whether to pause on assert_screen failure when already paused: test behaves as expected on resume
    • opening needle editor when paused and create a new needle
    • resume with new needle and successfully assert it
    • the 2nd point mentioned in the previous comment should be fixed now - it will now only pause on assert_screens but not on check_screens
    • the needle editor suggest to return to live view and not to restart the job if a development session for the job is present
  • What doesn't work:
    • The 1st point mentioned in the last comment is still TODO. This problem is also causing the tests to fail.
    • At some point the browser window seems to frees. Maybe this only happens when the JavaScript debugger is used.
    • Failed tags are automatically added in needle editor (required?).
  • Tests I've done far:
    • Add UI-only tests to t/ui/25-developer-mode.t
    • Add unit tests for assert_screen and check_screen to t/03-testapi.t
    • Otherwise only manual testing with 'opensuse-Staging:B-Staging-DVD-x86_64-Build1190.2-kde@64bit'
    • So t/33-developer-mode.t still needs to be extended.
      • To get started, I would just rename some needle (eg. boot-on_prompt.json) so it can not be found anymore. If the test pauses, I would rename it and check whether the test continues on resume. It is likely the easiest to just use the devel console for this (instead of dealing with the UI elements). DONE
      • Then I'd continue with further things like opening the needle editor when paused.

#5 Updated by mkittler over 1 year ago

Making the needle editor work is tricky. It basically works now but when adding a test for it (https://github.com/Martchus/openQA/commit/pause_on_assert_screen_timeout_upload_event) I just get 404. When trying it again with a short delay it works. So the reason is that the required results haven't been uploaded quickly enough. During manual testing I was always slow enough not to run into this issue.

I remember that we had a similar problem with the old interactive mode (screenshot was sometimes not available when opening the needle editor).

To fix this we likely have to pass the progress of the result upload to the client. Then it can wait till the upload has been completed before offering to open the needle editor. This could be done using the web socket route.


Can you give me any feedback about this so far? I'll deploy this on a staging instance so you can try yourself.

#6 Updated by mkittler over 1 year ago

I improved the 404 error pages shown when opening the needle editor for a running test. I think that's a cheap workaround for the problem mentioned in the previous comment and helpful anyways.

#7 Updated by coolo over 1 year ago

IMO the client needs to be certain the uploads are in before opening a link to the needle editor.

#8 Updated by mkittler over 1 year ago

My ideas how to implement that:

  1. The worker could pass that info to the os-autoinst command server which could forward it to it the liveview handler which could forward it to the browser. That way the info is forwarded twice which is not so nice. Also, replaying the info for a newly joined client would be more effort compared to 2.
  2. The worker could pass that info to the web UI using a new route provided by the livehandler daemon. The last part is important because only that way the info can be forwarded to the connected web browser clients via web sockets. The livehandler could also store the info in the database so it can be easily included to the status information every newly joining client receives at the beginning.

I think 2. is the better option so I'm going to implement it if there are no objections :-)

#10 Updated by coolo over 1 year ago

  • Related to action #41015: Don't use livehandler if no developer looks at it added

#11 Updated by coolo over 1 year ago

  • Status changed from In Progress to Resolved

This is implemented - but there are stability issues with it, we track in https://progress.opensuse.org/issues/41015

#12 Updated by coolo over 1 year ago

  • Target version changed from Current Sprint to Done

Also available in: Atom PDF