action #109737
closed[opensuse][sporadic] test fails in chromium due to lost characters when typing in the address bar size:M
Added by okurz over 2 years ago. Updated almost 2 years ago.
0%
Description
Observation¶
openQA test in scenario opensuse-15.3-DVD-Updates-x86_64-gnome@64bit-2G fails in
chromium
due to lost characters when typing in the address bar which triggers an unexpected consent dialog shown for the page google.com instead of the expected "about" dialog.
The problem is triggered by the test API command
enter_cmd "chrome://version ";
which in some cases is not correctly typed. For example in https://openqa.opensuse.org/tests/2287780#step/chromium/19 it can be seen that a google search for "chrion" is conducted, likely because only some characters of "chrome://version " where typed, ending up with just the first three "chr" plus the last three (with our without the following space) "ion".
Acceptance criteria¶
- AC1: Chromium tests no longer fail sporadically
Reproducible¶
Fails sporadically. In https://openqa.opensuse.org/tests/2287780#next_previous I find a fail rate of roughly 3%.
Expected result¶
The test should ensure stable typing in the browser address bar or check for correct typing to prevent going to google at all or foresee the consent dialog. https://openqa.opensuse.org/tests/2287076#step/chromium/19 shows how the about dialog looks like when there is no consent dialog for google because in https://openqa.opensuse.org/tests/2287076#step/chromium/17 search.opensuse.org is shown.
Further details¶
Always latest result in this scenario: latest
Suggestions¶
- Type every single character very slowly
- Type first n(=10) characters slowly
- Switch off autocomplete suggestions in the URL
- If nothing works, just type all characters slowly again
Updated by okurz over 2 years ago
- Status changed from New to In Progress
- Assignee set to okurz
- Target version set to Ready
Updated by okurz over 2 years ago
name=okurz_poo109737_chromium; end=1 openqa-clone-set https://openqa.opensuse.org/tests/2287796 $name SCHEDULE=tests/boot/boot_to_desktop,tests/x11/chromium
Created job #2287933: opensuse-15.3-DVD-Updates-x86_64-Build20220409-1-extra_tests_misc@64bit -> https://openqa.opensuse.org/t2287933
But the base qcow image is already pruned from 15.3 Updates. That's annoying. I bumped from the default 5 GB in the job group https://openqa.opensuse.org/admin/job_templates/80 to 80GB. I retriggered the image creation job https://openqa.opensuse.org/tests/2287934 first.
Also I did
name=okurz_poo109737_chromium; end=1 dry_run=echo openqa-clone-set https://openqa.opensuse.org/tests/2287796 $name SCHEDULE=tests/boot/boot_to_desktop,tests/x11/chromium
to get a dry run of command execution. I removed the --skip-chained-deps
and called openqa-clone-job manually. This triggered image creation parent jobs as well. Waiting for https://openqa.opensuse.org/tests/2287937
EDIT: Passed. The child job took 5m13s. So I can do name=okurz_poo109737_chromium; end=200 openqa-clone-set https://openqa.opensuse.org/tests/2287796 $name SCHEDULE=tests/boot/boot_to_desktop,tests/x11/chromium
-> https://openqa.opensuse.org/tests/overview?build=okurz_poo109737_chromium
Should await results and gather fail ratio in current state of tests before trying to change something, e.g. type slower. In case the issue can not be reproduced at all then likely the system is busy in the original scenario due to background load, e.g. memory depleted.
EDIT: 1/100 failed in with mistyped URL (https://openqa.opensuse.org/tests/2287996#step/chromium/20; 1 job failing in a later step, counting as passed). Triggering a bigger set for better statistics with name=okurz_poo109737_chromium; start=201 end=500 openqa-clone-set https://openqa.opensuse.org/tests/2287796 $name SCHEDULE=tests/boot/boot_to_desktop,tests/x11/chromium
Updated by okurz over 2 years ago
- Related to action #107632: [qe-core][leap][sporadic] Fix chromium test failing due to dropped keys added
Updated by okurz over 2 years ago
Result: 4/500 failed with lost characters so in this case even slightly below 1% fail ratio, but reproducible. I would now changed the enter_cmd
call to enter_cmd ..., max_interval => 4;
Updated by openqa_review over 2 years ago
- Due date set to 2022-04-24
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 2 years ago
Created new PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14698
and triggered
name=okurz_poo109737_chromium_pr14698; start=001 end=010 openqa-clone-set https://openqa.opensuse.org/tests/2287796 $name SCHEDULE=tests/boot/boot_to_desktop,tests/x11/chromium CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/chromium_poo109737
-> https://openqa.opensuse.org/tests/overview?build=okurz_poo109737_chromium_pr14698
I checked one old and one new job and saw that the runtime for the test module chromium increased from 2m30s to 3m30s
Updated by okurz over 2 years ago
To reduce the slowness I changed to only type the first three characters slowly and test that:
name=okurz_poo109737_chromium_pr14698_only_first_three_characters_slow; start=011 end=500 openqa-clone-set https://openqa.opensuse.org/tests/2294630 $name SCHEDULE=tests/boot/boot_to_desktop,tests/x11/chromium CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/chromium_poo109737
previous runs have aborted as they have been aborted. Retriggered based on a current image
https://openqa.opensuse.org/tests/overview?build=okurz_poo109737_chromium_pr14698_only_first_three_characters_slow
Updated by okurz over 2 years ago
- Status changed from In Progress to Feedback
All 500 jobs passed, PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14698 updated and ready for review.
Updated by okurz over 2 years ago
- Due date deleted (
2022-04-24) - Status changed from Feedback to Resolved
There were no further reviews, merging myself. I didn't do any needle changes and did thousand of verification jobs so considering this resolved.
Updated by mloviska over 2 years ago
- Status changed from Resolved to Feedback
Seems like it still fails to type chrome://version
correctly https://openqa.suse.de/tests/8807706#step/chromium/19. Re-opening
Updated by mloviska over 2 years ago
https://openqa.suse.de/tests/8816367#step/chromium/26
Mistyped https://upload.wikimedia.org/wikipedia/commons/d/d0/OpenSUSE_Logo.svg
Updated by mloviska over 2 years ago
Another failure on Leap15.3 https://openqa.opensuse.org/tests/2377093#step/chromium/22
Updated by okurz over 2 years ago
- Status changed from Feedback to New
- Assignee deleted (
okurz)
to be checked again
Updated by livdywan over 2 years ago
- Subject changed from [opensuse][sporadic] test fails in chromium due to lost characters when typing in the address bar to [opensuse][sporadic] test fails in chromium due to lost characters when typing in the address bar size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by livdywan over 2 years ago
- Status changed from Workable to In Progress
- Assignee set to livdywan
Updated by openqa_review over 2 years ago
- Due date set to 2022-06-16
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 2 years ago
- Status changed from In Progress to Feedback
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15004 merged. Please await verification from production, e.g. one night on o3 with no regressions observed.
Updated by mloviska over 2 years ago
Unfortunately, it is still failing
- https://openqa.suse.de/tests/8909126#step/chromium/19
- https://openqa.suse.de/tests/8909739#step/chromium/20
- https://openqa.suse.de/tests/8909565#step/chromium/20
- https://openqa.suse.de/tests/8909110#step/chromium/21
- https://openqa.suse.de/tests/8909107#step/chromium/20
- https://openqa.suse.de/tests/8909565#step/chromium/20
Updated by okurz over 2 years ago
- Status changed from Feedback to Workable
Ok, from the recent test failures we see again different problems, some missing characters in between, even before the 10th character, some missing at the end, some duplication of characters. We see the following ideas:
- Just start chromium with the correct address from the starter instead of typing in the address bar a) and terminate each instance of the browser and start again b) or just simplify the test again to only test a single URL
- instead of typing into the address bar use "xsel -i -p -b" to to copy-paste
- Try to find a way to disable auto-completion, e.g. similar as was done for Firefox in https://github.com/os-autoinst/os-autoinst-distri-opensuse/commit/def248c711503f2c6ed1276d61fce4c01b50eb6b
To trigger verification runs I suggest to find a test scenario that boots from a qcow file and then only trigger the relevant test module, e.g.
openqa-clone-custom-git-refspec https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15004 https://openqa.opensuse.org/tests/2405541 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/chromium
Updated by livdywan about 2 years ago
- Status changed from Workable to In Progress
okurz wrote:
Ok, from the recent test failures we see again different problems, some missing characters in between, even before the 10th character, some missing at the end, some duplication of characters. We see the following ideas:
- Just start chromium with the correct address from the starter instead of typing in the address bar a) and terminate each instance of the browser and start again
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15031
Updated by livdywan about 2 years ago
cdywan wrote:
okurz wrote:
Ok, from the recent test failures we see again different problems, some missing characters in between, even before the 10th character, some missing at the end, some duplication of characters. We see the following ideas:
- Just start chromium with the correct address from the starter instead of typing in the address bar a) and terminate each instance of the browser and start again
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15031
I discovered in the meanwhile that chrome://version is ignored on the command-line and another option we weren't aware of:
As per the upstream Chromium issue --allow-pre-commit-input
was added for the type of flakiness we're seeing. And maybe this allows typing to work without more hacks or delays.
Updated by okurz about 2 years ago
- Due date changed from 2022-06-16 to 2022-06-24
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15031 merged. Should be monitored over the next days.
Updated by livdywan about 2 years ago
- https://openqa.opensuse.org/tests/2422503#step/chromium/17
- https://openqa.opensuse.org/tests/2420962#step/chromium/20
Unfortunately it still fails sometimes. In both cases a search is opened instead of chrome://version
which probably means we need the esc key hack afterall - I had removed it because I thought the pre-commit setting would render it irrelevant (no pun intended).
And maybe it's best to also assert the urlbar state... I'm starting to think it's pointless to simplify the test for something I can never reproduce outside of a full schedule run.
Updated by szarate about 2 years ago
maybe simply start directly with the html5test, close, open again in whatever other url?. More over, while I get the thing that we're trying to use systems the way an user would do, why not simply go the automated test way?:
#!perl -w
use strict;
use Log::Log4perl qw(:easy);
use WWW::Mechanize::Chrome;
Log::Log4perl->easy_init($ERROR);
my $mech = WWW::Mechanize::Chrome->new(launch_exe => 'chromium-browser');
$mech->get( 'chrome://version' );
$mech->sleep( 10 );
$mech->get(qq(https://html5test.opensuse.org));
$mech->sleep( 10 );
$mech->get(qq(https://duckduckgo.org));
$mech->sleep( 10 );
$mech->sendkeys( string => "test\r" );
Also keep in mind that you could disable gpu acceleration (just in case?), also keep in mind that the memory might also be trolling you along with the machine's cpu config, so try 4CPU x 4GB for instance on 100 runs or so... because the scenario you're testing, is (at least for me, a bit unrealistic for web browsers, except in kiosk mode)
Updated by okurz about 2 years ago
szarate wrote:
[…] also keep in mind that the memory might also be trolling you along with the machine's cpu config, so try 4CPU x 4GB for instance on 100 runs or so... because the scenario you're testing, is (at least for me, a bit unrealistic for web browsers, except in kiosk mode)
it might also just be many background processes from previous test modules. This could be found out by checking if the error can be reproduced in a test scenario which only boots to desktop and starts chromium
Updated by livdywan about 2 years ago
okurz wrote:
szarate wrote:
[…] also keep in mind that the memory might also be trolling you along with the machine's cpu config, so try 4CPU x 4GB for instance on 100 runs or so... because the scenario you're testing, is (at least for me, a bit unrealistic for web browsers, except in kiosk mode)
it might also just be many background processes from previous test modules. This could be found out by checking if the error can be reproduced in a test scenario which only boots to desktop and starts chromium
It can't. At least I have not seen any failures with that, which is why I got rid of the extra needle assertions.
Updated by okurz about 2 years ago
cdywan wrote:
It can't. At least I have not seen any failures with that, which is why I got rid of the extra needle assertions.
this might just mean not enough jobs in the sample
Updated by livdywan about 2 years ago
- Status changed from In Progress to Feedback
cdywan wrote:
- https://openqa.opensuse.org/tests/2422503#step/chromium/17
- https://openqa.opensuse.org/tests/2420962#step/chromium/20
Unfortunately it still fails sometimes. In both cases a search is opened instead of
chrome://version
which probably means we need the esc key hack afterall - I had removed it because I thought the pre-commit setting would render it irrelevant (no pun intended).
Monitoring jobs again with the above changes merged
Updated by livdywan about 2 years ago
- Due date changed from 2022-06-24 to 2022-07-08
Bumping the due date til after hackweek, as per the previous comment
Updated by okurz about 2 years ago
Recent results on o3:
openqa=> select job_id,t_updated from job_modules where name='chromium' and result='failed' order by t_updated DESC limit 10;
job_id | t_updated
---------+---------------------
2446804 | 2022-07-03 02:49:09
2445889 | 2022-07-02 12:34:09
2445488 | 2022-07-02 07:56:44
2443311 | 2022-07-01 03:12:25
2443233 | 2022-07-01 01:51:38
2442442 | 2022-06-30 17:29:28
2442203 | 2022-06-30 10:39:42
2441511 | 2022-06-30 08:14:50
2441329 | 2022-06-30 03:33:20
2441282 | 2022-06-30 01:39:52
(10 rows)
so jobs still fail in chromium. At least https://openqa.opensuse.org/tests/2443233#step/chromium/20 looks related. Would you look at those please?
Updated by livdywan about 2 years ago
okurz wrote:
so jobs still fail in chromium. At least https://openqa.opensuse.org/tests/2443233#step/chromium/20 looks related. Would you look at those please?
I read the way we end up with a web search for chn
to mean we're losing keys due to busy cpu. Completion can't be blamed for it. Meaning waiting for the screen to change after every letter seems like the only option we have left: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15173
And semi-related I'm proposing a fix for the documented wait_screen_changes
option because it doesn't work, which I noticed whilst preparing the above PR: https://github.com/os-autoinst/os-autoinst/pull/2106
Updated by livdywan about 2 years ago
- Due date changed from 2022-07-08 to 2022-07-15
Seems I forgot to save the comment. We discussed last week that we're basically happy with the fix but @okurz offered to look into optimizing the underlying wait for screen changes so that's why the fix hasn't been merged yet.
Updated by livdywan about 2 years ago
cdywan wrote:
Seems I forgot to save the comment. We discussed last week that we're basically happy with the fix but @okurz offered to look into optimizing the underlying wait for screen changes so that's why the fix hasn't been merged yet.
So I took a look at what Oli was trying.
- There's the sleep in wait_still_screen which is affected by no_wait. And which doesn't seem relevant here.
- There's another
sleep 0.5
inwait_screen_change
after every iteration of the loop following the "waiting for screen change" log message. This can't be overridden. - Other sleeps cover mouse actions and probably don't relate here.
I'm wondering how to validate this, though, since I can only ever reproduce the typing problems in a full schedule run in production. Ideally I'd be able to set a custom isotovideo to run a branch, but our docs don't mention how to specify a git branch. Maybe I'm overlooking something.
I was also pondering unit tests but they won't help determine if this helps in practice.
Updated by livdywan about 2 years ago
- Due date changed from 2022-07-15 to 2022-07-22
cdywan wrote:
I'm wondering how to validate this, though, since I can only ever reproduce the typing problems in a full schedule run in production. Ideally I'd be able to set a custom isotovideo to run a branch, but our docs don't mention how to specify a git branch. Maybe I'm overlooking something.
Something like openqa-clone-job --within-instance https://openqa.opensuse.org/t2443233 TEST=eggs ISOTOVIDEO="podman run --pull=always --rm -it registry.opensuse.org/devel/openqa/testgithub/opr-4751/containers/isotovideo:qemu-x86 /usr/bin/isotovideo -d" _GROUP=0
should work to use the container that was built from git for a PR as documented (EDIT: Spoiler it won't work). However o3 isn't configured correctly for this to work:
Failed to read /etc/containers/storage.conf open /etc/containers/storage.conf: permission denied
time="2022-07-15T13:58:27+02:00" level=error msg="reading system config \"/usr/share/containers/containers.conf\": decode configuration /usr/share/containers/containers.conf: open /usr/share/containers/containers.conf: permission denied"
It seems like the workers are not consistent. Having forgotten to set WORKER_CLASS=openqaworker7
I see Can't exec "podman": No such file or directory
on openqaworker4. Consequently I tested out what's missing and started adding to a branch (and using aa-complain usr.share.openqa.script.worker
).
Also HOME=/tmp must be set to avoid a mkdir /var/lib/empty/.local: permission denied
error.
openqa-clone-job --within-instance https://openqa.opensuse.org/t2443233 TEST=eggs ISOTOVIDEO="HOME=/tmp podman run --pull=always --rm -it registry.opensuse.org/devel/openqa/containers-tw/isotovideo:qemu-x86 /usr/bin/isotovideo -d" WORKER_CLASS=openqaworker7 _GROUP=0
There's no containers for PR's in OBS. Maybe that was deprecated in the meantime, hence using the tumbleweed build to test something known to work for now.
Error: writing blob: adding layer with blob "sha256:a8d8b883c8fc630d7d6b0b05c5a45ccc59eb9c05c1985bbe91223332075ccb58": Error processing tar file(exit status 1): potentially insufficient UIDs or GIDs available in user namespace (requested 0:15 for /etc/shadow): Check /etc/subuid and /etc/subgid: lchown /etc/shadow: invalid argument
Apparently the worker isn't setup for rootless. Following my own advice the next step is sudo usermod --add-subuids 200000-201000 --add-subgids 200000-201000 _openqa-worker
... except somehow podman is writing in / rather than /dev/shm:
Error: failed to open 2048 locks in /libpod_rootless_lock_486: permission denied
Not clue for now what's causing this.
Updated by livdywan about 2 years ago
cdywan wrote:
Apparently the worker isn't setup for rootless. Following my own advice the next step is
sudo usermod --add-subuids 200000-201000 --add-subgids 200000-201000 _openqa-worker
... except somehow podman is writing in / rather than /dev/shm:Error: failed to open 2048 locks in /libpod_rootless_lock_486: permission denied
It seems as though this is indeed code trying to lock via /dev/shm. Still no idea how this can happen despite /dev/shm/libpod_rootless_lock_* being allowed.
TIL flags=(complain)
is added to the profile so I shouldn't override it with my own changes. However I still see failures with Error: error creating network namespace for container 89636839f7a299e903e139a6940b361324c160a0d961d88aa6dade06cc8e04e1: failed to create namespace: open /proc/40309/task/40353/ns/net: permission denied
. Despite /proc/*/task/*/ns/net rw
. And even though complain mode should be active 🧐️
Updated by livdywan about 2 years ago
- Copied to action #113800: Setup o3 to run rootless containers on worker hosts added
Updated by livdywan about 2 years ago
I filed #113800 to cover the use of a custom ISOTOVIDEO as it clearly exceeds a quick fix. And with this I'd like to suggest I wrap wrap up this ticket with the existing fix since attempting to tweak the underlying wait is turning into several related non-trivial tasks.
Updated by okurz about 2 years ago
cdywan wrote:
I filed #113800 to cover the use of a custom ISOTOVIDEO as it clearly exceeds a quick fix. And with this I'd like to suggest I wrap wrap up this ticket with the existing fix since attempting to tweak the underlying wait is turning into several related non-trivial tasks.
Well, if you can't use the custom isotovideo command then follow https://progress.opensuse.org/projects/openqav3/wiki/#Use-a-production-host-for-testing-backend-changes-locally-eg-svirt-powerVM-IPMI-bare-metal-s390x-etc to test stuff. Here it's about generic os-autoinst qemu related changes so it should even be easy to work on that locally. I would really prefer to have the wait-functions improved before using them to wait between each and every character to prevent a significant slowdown.
Updated by livdywan about 2 years ago
okurz wrote:
cdywan wrote:
I filed #113800 to cover the use of a custom ISOTOVIDEO as it clearly exceeds a quick fix. And with this I'd like to suggest I wrap wrap up this ticket with the existing fix since attempting to tweak the underlying wait is turning into several related non-trivial tasks.
Well, if you can't use the custom isotovideo command then follow https://progress.opensuse.org/projects/openqav3/wiki/#Use-a-production-host-for-testing-backend-changes-locally-eg-svirt-powerVM-IPMI-bare-metal-s390x-etc to test stuff. Here it's about generic os-autoinst qemu related changes so it should even be easy to work on that locally. I would really prefer to have the wait-functions improved before using them to wait between each and every character to prevent a significant slowdown.
The generic changes aren't the problem. It's having confirmed repeatedly that only the production run of the full scenario shows if it helps or not. Chances are a new option to reduce the sleep end up again breaking this specific scenario.
Updated by okurz about 2 years ago
- Related to coordination #109740: [epic] Stable os-autoinst unit tests with good coverage added
Updated by okurz about 2 years ago
- Copied to action #114412: Add support for "wait_screen_change" with "no_wait" option to allow to use on cases like "wait for every character to be typed" size:M added
Updated by okurz about 2 years ago
- Due date deleted (
2022-07-22) - Status changed from Feedback to Blocked
created new ticket #114412 to optimize. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15173 is merged. We can block on #114412 and revisit to update the test in case we need additional options to optimize the flow.
Updated by slo-gin about 2 years ago
This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
Updated by livdywan almost 2 years ago
- Status changed from Blocked to Feedback
This can be verified now
Updated by okurz almost 2 years ago
- Related to action #116554: Make sleeping time in "no_wait" scenarios consistent size:M added
Updated by mkittler almost 2 years ago
I've just compared recent jobs on https://openqa.opensuse.org/tests/latest with older ones and the execution time of chromium
dropped from > 3 minutes to > 2 minutes. So that's an improvement (due to #114412 being resolved). Not sure what we're waiting for to resolve this thicket, though.
Updated by okurz almost 2 years ago
mkittler wrote:
Not sure what we're waiting for to resolve this thicket, though.
well, AC1 says "Chromium tests no longer fail sporadically". Taking a look at https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD-Updates&machine=64bit-2G&test=gnome&version=15.3#next_previous I see the latest chromium failure on 2022-07-04 so pretty stable. To be sure I suggest to review the database for any latest chromium test failures.
Updated by okurz almost 2 years ago
- Status changed from Feedback to Resolved
Well, the above history must suffice for now. We had a nice workshop session today in the morning about that topic as well
Updated by okurz over 1 year ago
- Related to action #110542: Try to mitigate "VNC typing issues" with disabled key repeat added
Updated by okurz over 1 year ago
- Related to coordination #43889: [qe-core][epic][functional][virtio][wayland] openQA makes spelling mistakes added