action #44441
opentest fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
100%
Description
Observation¶
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
dashboard
Reproducible¶
Fails since (at least) Build :TW.1841 (current job)
Expected result¶
Last good: :TW.1840 (or more recent)
Further details¶
Always latest result in this scenario: latest
Acceptance Criteria¶
- AC1: x11_start_program code in os-autoinst-distri-openQA is in sync with os-autoinst-distri-opensuse
- AC2: tests passed with any issue
Updated by okurz almost 6 years ago
- Blocks action #45131: [functional][u] test fails in worker to unlock the screen of openQA-in-openQA test added
Updated by okurz almost 6 years ago
- Status changed from New to Workable
- Priority changed from Normal to High
- Target version changed from future to Milestone 24
Updated by jorauch almost 6 years ago
Actually we have much less code in os-autoinst-distri-openqa, lets see what we need from the new code
Updated by mgriessmeier almost 6 years ago
- Status changed from Workable to In Progress
Updated by jorauch almost 6 years ago
The program is started with x11_start_program("firefox http://localhost", 60, { valid => 1 } );
so it has a dedicated one minute timeout.
The error message says # Test died: no candidate needle with tag(s) 'displaymanager, displaymanager-password-prompt, generic-desktop, screenlock, gnome-screenlock-password' matched
which implies it is looking for a locked screen for some reason, which makes sense as the test runs ensure_unlocked_desktop
before.
On the first glance the function looks the same for both distributions.
In conclusion this might have been some serious hiccup, since the newer runs work fine
Updated by jorauch almost 6 years ago
It looks now to me like the send_key 'esc'
in Line 60 didn't reach the SUT, so this would qualify as a lost key problem
Updated by okurz almost 6 years ago
There should be no "lost keys" on x86_64 unless it's wayland or virtio or system bogged down by services on the SUT itself.
Updated by mgriessmeier almost 6 years ago
- Subject changed from [functional][u] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? to [functional][u][sporadic] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
- Priority changed from High to Normal
hmm, it didn't happen for a month now - and also before it was kinda sporadic.
I would like to lower the priority and as next step check what we can do to make this test more robust.
Since it's M24, it's fine if you unassign for now and put it back to the backlog
Updated by jorauch almost 6 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
jorauch)
As discussed with mgriessmeier I will unassign and we can revisit this later
Updated by okurz almost 6 years ago
As a simple suggestion one could trigger some more jobs, e.g. 100, and check fail rate – if it fails at all.
Updated by okurz almost 6 years ago
- Target version changed from Milestone 24 to Milestone 25
Updated by jorauch over 5 years ago
We had the issue 1 time the last 100 runs in production, it did not appear again after this ticket was created
Updated by jorauch over 5 years ago
- Status changed from Workable to In Progress
- Assignee set to jorauch
taking a look and trying to verify, that this was just a hiccup
Updated by jorauch over 5 years ago
It failed very often on pinky with this exact behaviour
Updated by jorauch over 5 years ago
The issue seems to be that the session got locked (I guess due to low performance extending the time between the instructions) but we have no needle with the password prompt for this product.
I will create a needle and try this on pinky
Updated by jorauch over 5 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
jorauch)
The missing needle actually is a new problem that has not yet appeared in production
As I am running out of ideas how we can work around this, I will unassign
Sergio confirmed that it seems like the ESC key does not reach the system or is not synchronized
Updated by zluo over 5 years ago
- Status changed from Workable to In Progress
- Assignee set to zluo
take over and check the current status for dashboard.
Updated by zluo over 5 years ago
https://openqa.opensuse.org/tests/965208#step/worker/4 shows that worker failed which should fixed at first because it blocks it's following test modules.
Updated by zluo over 5 years ago
http://f40.suse.de/tests/4149 failed for clone_job:
if we compare this vars.json: https://openqa.opensuse.org/tests/965331/file/vars.json
with the one of your instance: http://f40.suse.de/tests/4147/file/vars.json
a lot of settings are missing
Updated by zluo over 5 years ago
open a ticket:
https://progress.opensuse.org/issues/53606
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 25 to Milestone 26
Updated by zluo over 5 years ago
- Blocked by action #53606: PRODUCTDIR invalid when main.pm in casedir (not product-subdir) and cloning from caching worker to caching worker added
Updated by okurz over 5 years ago
- Status changed from Blocked to In Progress
I already mentioned a workaround in #53606 so the task should not be blocked.
Updated by zluo over 5 years ago
@okurz yes, I need also to corret NEEDLE_DIR as well:
openqa-clone-job --from http://f40.suse.de 4158 PRODUCTDIR=/var/lib/openqa/cache/f40.suse.de/tests/openqa --skip-deps NEEDLES_DIR=/var/lib/openqa/cache/f40.suse.de/tests/openqa/products/openqa/needles
Updated by zluo over 5 years ago
https://openqa.opensuse.org/tests/975393 doesn't show any issue.
Updated by zluo over 5 years ago
need to check https://openqa.opensuse.org/tests/975438#next_previous
and it fails at moduel "worker" for matching needle gnome-desktop at wait_for_desktop: http://f40.suse.de/tests/4181#live
this is quite strange, check if this is really a timeout issue.
Updated by zluo over 5 years ago
Updated by zluo over 5 years ago
https://openqa.opensuse.org/tests/975610/file/autoinst-log.txt shows sometimes gnome-desktop-20190509 is not matched. this is very strange.
compare with http://f40.suse.de/tests/4199/file/autoinst-log.txt, there is just fine without any issue to go further with x11_start_program.
Updated by zluo over 5 years ago
http://f40.suse.de/tests/4290#next_previous shows no issue for 100 test run.
I see also something changes from okurz in openqa tests. And needle gnome-desktop matches without any issue for current build.
Updated by zluo over 5 years ago
https://openqa.opensuse.org/tests/977793#next_previous shows that from TW.2356 to TW.2371 no failure. Reject it for now.
Updated by zluo over 5 years ago
- Blocked by deleted (action #53606: PRODUCTDIR invalid when main.pm in casedir (not product-subdir) and cloning from caching worker to caching worker)
Updated by okurz over 5 years ago
- Status changed from Rejected to In Progress
please read the ticket title again
Updated by zluo over 5 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
zluo)
As I spoke with scrum master @sergio I unanssign myself for now.
Updated by mgriessmeier over 5 years ago
- Status changed from Workable to New
- Target version changed from Milestone 26 to Milestone 27
next grooming
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 27 to Milestone 28
Updated by mgriessmeier almost 5 years ago
- Target version changed from Milestone 28 to Milestone 31
Updated by okurz almost 5 years ago
- Project changed from openQA Tests (public) to openQA Project (public)
- Subject changed from [functional][u][sporadic] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? to test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
- Category changed from Bugs in existing tests to Organisational
- Status changed from Rejected to New
- Target version changed from Milestone 31 to future
It's rather annoying to see another ticket that is a consequence of dheidler not following my advice and instead copy-pasting the openQA-in-openQA tests to os-autoinst-distri-opensuse. os-autoinst-distri-openqa can act as a very good example for the QA tools team. And in light of even the SUSE company goals I consider it important to look into reusing test library functions better than copy-pasting to other github repos so let's see if we can plan this for the tools team then.
Updated by okurz almost 3 years ago
- Related to coordination #106922: [epic][sporadic] openqa_from_git fails in dashboard due to ensure_unlocked_desktop not expecting password entry screen in case of locked desktop auto_review:"match=desktop-runner,screenlock timed out.*":retry added