Project

General

Profile

action #44441

test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?

Added by okurz over 2 years ago. Updated about 1 year ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Organisational
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
dashboard

Reproducible

Fails since (at least) Build :TW.1841 (current job)

Expected result

Last good: :TW.1840 (or more recent)

Further details

Always latest result in this scenario: latest

Acceptance Criteria

  • AC1: x11_start_program code in os-autoinst-distri-openQA is in sync with os-autoinst-distri-opensuse
  • AC2: tests passed with any issue

Related issues

Blocks openQA Tests - action #45131: [functional][u] test fails in worker to unlock the screen of openQA-in-openQA testRejected2018-12-13

History

#1 Updated by okurz about 2 years ago

  • Blocks action #45131: [functional][u] test fails in worker to unlock the screen of openQA-in-openQA test added

#2 Updated by okurz about 2 years ago

  • Status changed from New to Workable
  • Priority changed from Normal to High
  • Target version changed from future to Milestone 24

#3 Updated by jorauch about 2 years ago

  • Assignee set to jorauch

It's mine!

#4 Updated by jorauch about 2 years ago

Actually we have much less code in os-autoinst-distri-openqa, lets see what we need from the new code

#5 Updated by mgriessmeier about 2 years ago

  • Status changed from Workable to In Progress

#6 Updated by jorauch about 2 years ago

The program is started with x11_start_program("firefox http://localhost", 60, { valid => 1 } ); so it has a dedicated one minute timeout.

The error message says # Test died: no candidate needle with tag(s) 'displaymanager, displaymanager-password-prompt, generic-desktop, screenlock, gnome-screenlock-password' matched which implies it is looking for a locked screen for some reason, which makes sense as the test runs ensure_unlocked_desktop before.
On the first glance the function looks the same for both distributions.

In conclusion this might have been some serious hiccup, since the newer runs work fine

#7 Updated by jorauch about 2 years ago

It looks now to me like the send_key 'esc' in Line 60 didn't reach the SUT, so this would qualify as a lost key problem

#8 Updated by okurz about 2 years ago

There should be no "lost keys" on x86_64 unless it's wayland or virtio or system bogged down by services on the SUT itself.

#9 Updated by mgriessmeier about 2 years ago

  • Subject changed from [functional][u] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? to [functional][u][sporadic] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
  • Priority changed from High to Normal

hmm, it didn't happen for a month now - and also before it was kinda sporadic.
I would like to lower the priority and as next step check what we can do to make this test more robust.
Since it's M24, it's fine if you unassign for now and put it back to the backlog

#10 Updated by jorauch about 2 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (jorauch)

As discussed with mgriessmeier I will unassign and we can revisit this later

#11 Updated by okurz about 2 years ago

As a simple suggestion one could trigger some more jobs, e.g. 100, and check fail rate – if it fails at all.

#12 Updated by okurz about 2 years ago

  • Target version changed from Milestone 24 to Milestone 25

#13 Updated by jorauch almost 2 years ago

We had the issue 1 time the last 100 runs in production, it did not appear again after this ticket was created

#14 Updated by jorauch almost 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to jorauch

taking a look and trying to verify, that this was just a hiccup

#15 Updated by jorauch almost 2 years ago

Started 100 times on pinky

#16 Updated by jorauch almost 2 years ago

It failed very often on pinky with this exact behaviour

#17 Updated by jorauch almost 2 years ago

The issue seems to be that the session got locked (I guess due to low performance extending the time between the instructions) but we have no needle with the password prompt for this product.

I will create a needle and try this on pinky

#18 Updated by jorauch almost 2 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (jorauch)

The missing needle actually is a new problem that has not yet appeared in production
As I am running out of ideas how we can work around this, I will unassign
Sergio confirmed that it seems like the ESC key does not reach the system or is not synchronized

#19 Updated by zluo almost 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over and check the current status for dashboard.

#20 Updated by zluo almost 2 years ago

https://openqa.opensuse.org/tests/965208#step/worker/4 shows that worker failed which should fixed at first because it blocks it's following test modules.

#21 Updated by zluo almost 2 years ago

http://f40.suse.de/tests/4149 failed for clone_job:

if we compare this vars.json: https://openqa.opensuse.org/tests/965331/file/vars.json
with the one of your instance: http://f40.suse.de/tests/4147/file/vars.json
a lot of settings are missing

#23 Updated by mgriessmeier almost 2 years ago

  • Target version changed from Milestone 25 to Milestone 26

#24 Updated by zluo almost 2 years ago

  • Blocked by action #53606: PRODUCTDIR invalid when main.pm in casedir (not product-subdir) and cloning from caching worker to caching worker added

#25 Updated by zluo almost 2 years ago

  • Status changed from In Progress to Blocked

#26 Updated by okurz almost 2 years ago

  • Status changed from Blocked to In Progress

I already mentioned a workaround in #53606 so the task should not be blocked.

#27 Updated by zluo almost 2 years ago

okurz yes, I need also to corret NEEDLE_DIR as well:

openqa-clone-job --from http://f40.suse.de 4158 PRODUCTDIR=/var/lib/openqa/cache/f40.suse.de/tests/openqa --skip-deps NEEDLES_DIR=/var/lib/openqa/cache/f40.suse.de/tests/openqa/products/openqa/needles

#28 Updated by zluo almost 2 years ago

#29 Updated by zluo almost 2 years ago

need to check https://openqa.opensuse.org/tests/975438#next_previous

and it fails at moduel "worker" for matching needle gnome-desktop at wait_for_desktop: http://f40.suse.de/tests/4181#live

this is quite strange, check if this is really a timeout issue.

#30 Updated by zluo almost 2 years ago

it seems that unlock desktop doesn't work...

#32 Updated by zluo almost 2 years ago

https://openqa.opensuse.org/tests/975610/file/autoinst-log.txt shows sometimes gnome-desktop-20190509 is not matched. this is very strange.
compare with http://f40.suse.de/tests/4199/file/autoinst-log.txt, there is just fine without any issue to go further with x11_start_program.

#33 Updated by zluo almost 2 years ago

http://f40.suse.de/tests/4290#next_previous shows no issue for 100 test run.

I see also something changes from okurz in openqa tests. And needle gnome-desktop matches without any issue for current build.

#34 Updated by zluo almost 2 years ago

https://openqa.opensuse.org/tests/977793#next_previous shows that from TW.2356 to TW.2371 no failure. Reject it for now.

#35 Updated by zluo almost 2 years ago

  • Blocked by deleted (action #53606: PRODUCTDIR invalid when main.pm in casedir (not product-subdir) and cloning from caching worker to caching worker)

#36 Updated by zluo almost 2 years ago

  • Status changed from In Progress to Rejected

#37 Updated by okurz almost 2 years ago

  • Status changed from Rejected to In Progress

please read the ticket title again

#38 Updated by zluo almost 2 years ago

  • Description updated (diff)

#39 Updated by zluo almost 2 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (zluo)

As I spoke with scrum master @sergio I unanssign myself for now.

#40 Updated by mgriessmeier over 1 year ago

  • Status changed from Workable to New
  • Target version changed from Milestone 26 to Milestone 27

next grooming

#41 Updated by mgriessmeier over 1 year ago

  • Target version changed from Milestone 27 to Milestone 28

#42 Updated by mgriessmeier over 1 year ago

  • Target version changed from Milestone 28 to Milestone 31

#43 Updated by mgriessmeier about 1 year ago

  • Status changed from New to Rejected

works

#44 Updated by okurz about 1 year ago

  • Project changed from openQA Tests to openQA Project
  • Subject changed from [functional][u][sporadic] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? to test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
  • Category changed from Bugs in existing tests to Organisational
  • Status changed from Rejected to New
  • Target version changed from Milestone 31 to future

It's rather annoying to see another ticket that is a consequence of dheidler not following my advice and instead copy-pasting the openQA-in-openQA tests to os-autoinst-distri-opensuse. os-autoinst-distri-openqa can act as a very good example for the QA tools team. And in light of even the SUSE company goals I consider it important to look into reusing test library functions better than copy-pasting to other github repos so let's see if we can plan this for the tools team then.

#45 Updated by okurz about 1 year ago

  • Priority changed from Normal to Low

Also available in: Atom PDF