action #44441: test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #44441

open

test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?

Added by okurz over 6 years ago. Updated about 5 years ago.

Status:

New

Priority:

Low

Assignee:

Category:

Organisational

Target version:

QA (public) - future

Start date:

Due date:

% Done:

100%

Estimated time:

Description

Observation¶

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
dashboard

Reproducible¶

Fails since (at least) Build :TW.1841 (current job)

Expected result¶

Last good: :TW.1840 (or more recent)

Further details¶

Always latest result in this scenario: latest

Acceptance Criteria¶

AC1: x11_start_program code in os-autoinst-distri-openQA is in sync with os-autoinst-distri-opensuse
AC2: tests passed with any issue

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by okurz about 6 years ago

Blocks action #45131: [functional][u] test fails in worker to unlock the screen of openQA-in-openQA test added

Actions

Copy link

Updated by okurz about 6 years ago

Status changed from New to Workable
Priority changed from Normal to High
Target version changed from future to Milestone 24

Actions

Copy link

Updated by jorauch about 6 years ago

Assignee set to jorauch

It's mine!

Actions

Copy link

Updated by jorauch about 6 years ago

Actually we have much less code in os-autoinst-distri-openqa, lets see what we need from the new code

Actions

Copy link

Updated by mgriessmeier about 6 years ago

Status changed from Workable to In Progress

Actions

Copy link

Updated by jorauch about 6 years ago

The program is started with x11_start_program("firefox http://localhost", 60, { valid => 1 } ); so it has a dedicated one minute timeout.

The error message says # Test died: no candidate needle with tag(s) 'displaymanager, displaymanager-password-prompt, generic-desktop, screenlock, gnome-screenlock-password' matched which implies it is looking for a locked screen for some reason, which makes sense as the test runs ensure_unlocked_desktop before.
On the first glance the function looks the same for both distributions.

In conclusion this might have been some serious hiccup, since the newer runs work fine

Actions

Copy link

Updated by jorauch about 6 years ago

It looks now to me like the send_key 'esc' in Line 60 didn't reach the SUT, so this would qualify as a lost key problem

Actions

Copy link

Updated by okurz about 6 years ago

There should be no "lost keys" on x86_64 unless it's wayland or virtio or system bogged down by services on the SUT itself.

Actions

Copy link

Updated by mgriessmeier about 6 years ago

Subject changed from [functional][u] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? to [functional][u][sporadic] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
Priority changed from High to Normal

hmm, it didn't happen for a month now - and also before it was kinda sporadic.
I would like to lower the priority and as next step check what we can do to make this test more robust.
Since it's M24, it's fine if you unassign for now and put it back to the backlog

Actions

Copy link

#10

Updated by jorauch about 6 years ago

Status changed from In Progress to Workable
Assignee deleted (~~jorauch~~)

As discussed with mgriessmeier I will unassign and we can revisit this later

Actions

Copy link

#11

Updated by okurz about 6 years ago

As a simple suggestion one could trigger some more jobs, e.g. 100, and check fail rate – if it fails at all.

Actions

Copy link

#12

Updated by okurz about 6 years ago

Target version changed from Milestone 24 to Milestone 25

Actions

Copy link

#13

Updated by jorauch about 6 years ago

We had the issue 1 time the last 100 runs in production, it did not appear again after this ticket was created

Actions

Copy link

#14

Updated by jorauch about 6 years ago

Status changed from Workable to In Progress
Assignee set to jorauch

taking a look and trying to verify, that this was just a hiccup

Actions

Copy link

#15

Updated by jorauch about 6 years ago

Started 100 times on pinky

Actions

Copy link

#16

Updated by jorauch about 6 years ago

It failed very often on pinky with this exact behaviour

Actions

Copy link

#17

Updated by jorauch about 6 years ago

The issue seems to be that the session got locked (I guess due to low performance extending the time between the instructions) but we have no needle with the password prompt for this product.

I will create a needle and try this on pinky

Actions

Copy link

#18

Updated by jorauch about 6 years ago

Status changed from In Progress to Workable
Assignee deleted (~~jorauch~~)

The missing needle actually is a new problem that has not yet appeared in production
As I am running out of ideas how we can work around this, I will unassign
Sergio confirmed that it seems like the ESC key does not reach the system or is not synchronized

Actions

Copy link

#19

Updated by zluo almost 6 years ago

Status changed from Workable to In Progress
Assignee set to zluo

take over and check the current status for dashboard.

Actions

Copy link

#20

Updated by zluo almost 6 years ago

https://openqa.opensuse.org/tests/965208#step/worker/4 shows that worker failed which should fixed at first because it blocks it's following test modules.

Actions

Copy link

#21

Updated by zluo almost 6 years ago

http://f40.suse.de/tests/4149 failed for clone_job:

if we compare this vars.json: https://openqa.opensuse.org/tests/965331/file/vars.json
with the one of your instance: http://f40.suse.de/tests/4147/file/vars.json
a lot of settings are missing

Actions

Copy link

#22

Updated by zluo almost 6 years ago

open a ticket:
https://progress.opensuse.org/issues/53606

Actions

Copy link

#23

Updated by mgriessmeier almost 6 years ago

Target version changed from Milestone 25 to Milestone 26

Actions

Copy link

#24

Updated by zluo almost 6 years ago

Blocked by action #53606: PRODUCTDIR invalid when main.pm in casedir (not product-subdir) and cloning from caching worker to caching worker added

Actions

Copy link

#25

Updated by zluo almost 6 years ago

Status changed from In Progress to Blocked

Actions

Copy link

#26

Updated by okurz almost 6 years ago

Status changed from Blocked to In Progress

I already mentioned a workaround in #53606 so the task should not be blocked.

Actions

Copy link

#27

Updated by zluo almost 6 years ago

@okurz yes, I need also to corret NEEDLE_DIR as well:

openqa-clone-job --from http://f40.suse.de 4158 PRODUCTDIR=/var/lib/openqa/cache/f40.suse.de/tests/openqa --skip-deps NEEDLES_DIR=/var/lib/openqa/cache/f40.suse.de/tests/openqa/products/openqa/needles

Actions

Copy link

#28

Updated by zluo almost 6 years ago

https://openqa.opensuse.org/tests/975393 doesn't show any issue.

Actions

Copy link

#29

Updated by zluo almost 6 years ago

need to check https://openqa.opensuse.org/tests/975438#next_previous

and it fails at moduel "worker" for matching needle gnome-desktop at wait_for_desktop: http://f40.suse.de/tests/4181#live

this is quite strange, check if this is really a timeout issue.

Actions

Copy link

#30

Updated by zluo almost 6 years ago

it seems that unlock desktop doesn't work...

Actions

Copy link

#31

Updated by zluo almost 6 years ago

to check on O3:
https://openqa.opensuse.org/tests/overview?build=%3ATW.2357_zluo-verification_poo44441&distri=openqa&version=Tumbleweed

Actions

Copy link

#32

Updated by zluo almost 6 years ago

https://openqa.opensuse.org/tests/975610/file/autoinst-log.txt shows sometimes gnome-desktop-20190509 is not matched. this is very strange.
compare with http://f40.suse.de/tests/4199/file/autoinst-log.txt, there is just fine without any issue to go further with x11_start_program.

Actions

Copy link

#33

Updated by zluo almost 6 years ago

http://f40.suse.de/tests/4290#next_previous shows no issue for 100 test run.

I see also something changes from okurz in openqa tests. And needle gnome-desktop matches without any issue for current build.

Actions

Copy link

#34

Updated by zluo almost 6 years ago

https://openqa.opensuse.org/tests/977793#next_previous shows that from TW.2356 to TW.2371 no failure. Reject it for now.

Actions

Copy link

#35

Updated by zluo almost 6 years ago

Blocked by deleted (action #53606: PRODUCTDIR invalid when main.pm in casedir (not product-subdir) and cloning from caching worker to caching worker)

Actions

Copy link

#36

Updated by zluo almost 6 years ago

Status changed from In Progress to Rejected

Actions

Copy link

#37

Updated by okurz almost 6 years ago

Status changed from Rejected to In Progress

please read the ticket title again

Actions

Copy link

#38

Updated by zluo almost 6 years ago

Description updated (diff)

Actions

Copy link

#39

Updated by zluo almost 6 years ago

Status changed from In Progress to Workable
Assignee deleted (~~zluo~~)

As I spoke with scrum master @sergio I unanssign myself for now.

Actions

Copy link

#40

Updated by mgriessmeier almost 6 years ago

Status changed from Workable to New
Target version changed from Milestone 26 to Milestone 27

next grooming

Actions

Copy link

#41

Updated by mgriessmeier over 5 years ago

Target version changed from Milestone 27 to Milestone 28

Actions

Copy link

#42

Updated by mgriessmeier over 5 years ago

Target version changed from Milestone 28 to Milestone 31

Actions

Copy link

#43

Updated by mgriessmeier about 5 years ago

Status changed from New to Rejected

works

Actions

Copy link

#44

Updated by okurz about 5 years ago

Project changed from openQA Tests (public) to openQA Project (public)
Subject changed from [functional][u][sporadic] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? to test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
Category changed from Bugs in existing tests to Organisational
Status changed from Rejected to New
Target version changed from Milestone 31 to future

It's rather annoying to see another ticket that is a consequence of dheidler not following my advice and instead copy-pasting the openQA-in-openQA tests to os-autoinst-distri-opensuse. os-autoinst-distri-openqa can act as a very good example for the QA tools team. And in light of even the SUSE company goals I consider it important to look into reusing test library functions better than copy-pasting to other github repos so let's see if we can plan this for the tools team then.

Actions

Copy link

#45

Updated by okurz about 5 years ago

Priority changed from Normal to Low

Actions

Copy link

#46

Updated by okurz over 3 years ago

Related to coordination #106922: [epic][sporadic] openqa_from_git fails in dashboard due to ensure_unlocked_desktop not expecting password entry screen in case of locked desktop auto_review:"match=desktop-runner,screenlock timed out.*":retry added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #44441

test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?

Observation¶

Reproducible¶

Expected result¶

Further details¶

Acceptance Criteria¶

Updated by okurz about 6 years ago

Updated by okurz about 6 years ago

Updated by jorauch about 6 years ago

Updated by jorauch about 6 years ago

Updated by mgriessmeier about 6 years ago

Updated by jorauch about 6 years ago

Updated by jorauch about 6 years ago

Updated by okurz about 6 years ago

Updated by mgriessmeier about 6 years ago

Updated by jorauch about 6 years ago

Updated by okurz about 6 years ago

Updated by okurz about 6 years ago

Updated by jorauch about 6 years ago

Updated by jorauch about 6 years ago

Updated by jorauch about 6 years ago

Updated by jorauch about 6 years ago

Updated by jorauch about 6 years ago

Updated by jorauch about 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by mgriessmeier almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by okurz almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by okurz almost 6 years ago

Updated by zluo almost 6 years ago

Updated by zluo almost 6 years ago

Updated by mgriessmeier almost 6 years ago

Updated by mgriessmeier over 5 years ago

Updated by mgriessmeier over 5 years ago

Updated by mgriessmeier about 5 years ago

Updated by okurz about 5 years ago

Updated by okurz about 5 years ago

Updated by okurz over 3 years ago