Project

General

Profile

Actions

action #44441

open

test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?

Added by okurz almost 5 years ago. Updated over 3 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Organisational
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
dashboard

Reproducible

Fails since (at least) Build :TW.1841 (current job)

Expected result

Last good: :TW.1840 (or more recent)

Further details

Always latest result in this scenario: latest

Acceptance Criteria

  • AC1: x11_start_program code in os-autoinst-distri-openQA is in sync with os-autoinst-distri-opensuse
  • AC2: tests passed with any issue

Related issues 2 (1 open1 closed)

Related to openQA Project - coordination #106922: [epic][sporadic] openqa_from_git fails in dashboard due to ensure_unlocked_desktop not expecting password entry screen in case of locked desktop auto_review:"match=desktop-runner,screenlock timed out.*":retryBlockedokurz2022-02-16

Actions
Blocks openQA Tests - action #45131: [functional][u] test fails in worker to unlock the screen of openQA-in-openQA testRejectedmgriessmeier2018-12-13

Actions
Actions #1

Updated by okurz over 4 years ago

  • Blocks action #45131: [functional][u] test fails in worker to unlock the screen of openQA-in-openQA test added
Actions #2

Updated by okurz over 4 years ago

  • Status changed from New to Workable
  • Priority changed from Normal to High
  • Target version changed from future to Milestone 24
Actions #3

Updated by jorauch over 4 years ago

  • Assignee set to jorauch

It's mine!

Actions #4

Updated by jorauch over 4 years ago

Actually we have much less code in os-autoinst-distri-openqa, lets see what we need from the new code

Actions #5

Updated by mgriessmeier over 4 years ago

  • Status changed from Workable to In Progress
Actions #6

Updated by jorauch over 4 years ago

The program is started with x11_start_program("firefox http://localhost", 60, { valid => 1 } ); so it has a dedicated one minute timeout.

The error message says # Test died: no candidate needle with tag(s) 'displaymanager, displaymanager-password-prompt, generic-desktop, screenlock, gnome-screenlock-password' matched which implies it is looking for a locked screen for some reason, which makes sense as the test runs ensure_unlocked_desktop before.
On the first glance the function looks the same for both distributions.

In conclusion this might have been some serious hiccup, since the newer runs work fine

Actions #7

Updated by jorauch over 4 years ago

It looks now to me like the send_key 'esc' in Line 60 didn't reach the SUT, so this would qualify as a lost key problem

Actions #8

Updated by okurz over 4 years ago

There should be no "lost keys" on x86_64 unless it's wayland or virtio or system bogged down by services on the SUT itself.

Actions #9

Updated by mgriessmeier over 4 years ago

  • Subject changed from [functional][u] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? to [functional][u][sporadic] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
  • Priority changed from High to Normal

hmm, it didn't happen for a month now - and also before it was kinda sporadic.
I would like to lower the priority and as next step check what we can do to make this test more robust.
Since it's M24, it's fine if you unassign for now and put it back to the backlog

Actions #10

Updated by jorauch over 4 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (jorauch)

As discussed with mgriessmeier I will unassign and we can revisit this later

Actions #11

Updated by okurz over 4 years ago

As a simple suggestion one could trigger some more jobs, e.g. 100, and check fail rate – if it fails at all.

Actions #12

Updated by okurz over 4 years ago

  • Target version changed from Milestone 24 to Milestone 25
Actions #13

Updated by jorauch over 4 years ago

We had the issue 1 time the last 100 runs in production, it did not appear again after this ticket was created

Actions #14

Updated by jorauch over 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to jorauch

taking a look and trying to verify, that this was just a hiccup

Actions #15

Updated by jorauch over 4 years ago

Started 100 times on pinky

Actions #16

Updated by jorauch over 4 years ago

It failed very often on pinky with this exact behaviour

Actions #17

Updated by jorauch over 4 years ago

The issue seems to be that the session got locked (I guess due to low performance extending the time between the instructions) but we have no needle with the password prompt for this product.

I will create a needle and try this on pinky

Actions #18

Updated by jorauch over 4 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (jorauch)

The missing needle actually is a new problem that has not yet appeared in production
As I am running out of ideas how we can work around this, I will unassign
Sergio confirmed that it seems like the ESC key does not reach the system or is not synchronized

Actions #19

Updated by zluo over 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over and check the current status for dashboard.

Actions #20

Updated by zluo over 4 years ago

https://openqa.opensuse.org/tests/965208#step/worker/4 shows that worker failed which should fixed at first because it blocks it's following test modules.

Actions #21

Updated by zluo over 4 years ago

http://f40.suse.de/tests/4149 failed for clone_job:

if we compare this vars.json: https://openqa.opensuse.org/tests/965331/file/vars.json
with the one of your instance: http://f40.suse.de/tests/4147/file/vars.json
a lot of settings are missing

Actions #23

Updated by mgriessmeier about 4 years ago

  • Target version changed from Milestone 25 to Milestone 26
Actions #24

Updated by zluo about 4 years ago

  • Blocked by action #53606: PRODUCTDIR invalid when main.pm in casedir (not product-subdir) and cloning from caching worker to caching worker added
Actions #25

Updated by zluo about 4 years ago

  • Status changed from In Progress to Blocked
Actions #26

Updated by okurz about 4 years ago

  • Status changed from Blocked to In Progress

I already mentioned a workaround in #53606 so the task should not be blocked.

Actions #27

Updated by zluo about 4 years ago

@okurz yes, I need also to corret NEEDLE_DIR as well:

openqa-clone-job --from http://f40.suse.de 4158 PRODUCTDIR=/var/lib/openqa/cache/f40.suse.de/tests/openqa --skip-deps NEEDLES_DIR=/var/lib/openqa/cache/f40.suse.de/tests/openqa/products/openqa/needles

Actions #28

Updated by zluo about 4 years ago

Actions #29

Updated by zluo about 4 years ago

need to check https://openqa.opensuse.org/tests/975438#next_previous

and it fails at moduel "worker" for matching needle gnome-desktop at wait_for_desktop: http://f40.suse.de/tests/4181#live

this is quite strange, check if this is really a timeout issue.

Actions #30

Updated by zluo about 4 years ago

it seems that unlock desktop doesn't work...

Actions #32

Updated by zluo about 4 years ago

https://openqa.opensuse.org/tests/975610/file/autoinst-log.txt shows sometimes gnome-desktop-20190509 is not matched. this is very strange.
compare with http://f40.suse.de/tests/4199/file/autoinst-log.txt, there is just fine without any issue to go further with x11_start_program.

Actions #33

Updated by zluo about 4 years ago

http://f40.suse.de/tests/4290#next_previous shows no issue for 100 test run.

I see also something changes from okurz in openqa tests. And needle gnome-desktop matches without any issue for current build.

Actions #34

Updated by zluo about 4 years ago

https://openqa.opensuse.org/tests/977793#next_previous shows that from TW.2356 to TW.2371 no failure. Reject it for now.

Actions #35

Updated by zluo about 4 years ago

  • Blocked by deleted (action #53606: PRODUCTDIR invalid when main.pm in casedir (not product-subdir) and cloning from caching worker to caching worker)
Actions #36

Updated by zluo about 4 years ago

  • Status changed from In Progress to Rejected
Actions #37

Updated by okurz about 4 years ago

  • Status changed from Rejected to In Progress

please read the ticket title again

Actions #38

Updated by zluo about 4 years ago

  • Description updated (diff)
Actions #39

Updated by zluo about 4 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (zluo)

As I spoke with scrum master @sergio I unanssign myself for now.

Actions #40

Updated by mgriessmeier about 4 years ago

  • Status changed from Workable to New
  • Target version changed from Milestone 26 to Milestone 27

next grooming

Actions #41

Updated by mgriessmeier about 4 years ago

  • Target version changed from Milestone 27 to Milestone 28
Actions #42

Updated by mgriessmeier over 3 years ago

  • Target version changed from Milestone 28 to Milestone 31
Actions #43

Updated by mgriessmeier over 3 years ago

  • Status changed from New to Rejected

works

Actions #44

Updated by okurz over 3 years ago

  • Project changed from openQA Tests to openQA Project
  • Subject changed from [functional][u][sporadic] test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA? to test fails in dashboard of openQA-in-openQA test, timeout looking for desktop runner is too short, do we have old x11_start_program code in os-autoinst-distri-openQA?
  • Category changed from Bugs in existing tests to Organisational
  • Status changed from Rejected to New
  • Target version changed from Milestone 31 to future

It's rather annoying to see another ticket that is a consequence of dheidler not following my advice and instead copy-pasting the openQA-in-openQA tests to os-autoinst-distri-opensuse. os-autoinst-distri-openqa can act as a very good example for the QA tools team. And in light of even the SUSE company goals I consider it important to look into reusing test library functions better than copy-pasting to other github repos so let's see if we can plan this for the tools team then.

Actions #45

Updated by okurz over 3 years ago

  • Priority changed from Normal to Low
Actions #46

Updated by okurz over 1 year ago

  • Related to coordination #106922: [epic][sporadic] openqa_from_git fails in dashboard due to ensure_unlocked_desktop not expecting password entry screen in case of locked desktop auto_review:"match=desktop-runner,screenlock timed out.*":retry added
Actions

Also available in: Atom PDF