action #95721
closed[Sporadic] containers: tests fail with "Test died: no candidate needle with tag(s) 'inst-console' matched" size:M
Description
Motivation¶
Eventually in some of the openQA container tests in openQA we find this error "Test died: no candidate needle with tag(s) 'inst-console' matched"
See these examples:
https://openqa.opensuse.org/tests/1847722#step/openqa_webui/3
https://openqa.opensuse.org/tests/1848261#step/openqa_webui/3
Acceptance Criteria¶
- AC 1: the test pass without these errors in high enough runs (check the current frequency of the problem)
Suggestion¶
- Try to reproduce the problem locally
- In the worst case increase the timeout
Updated by ilausuch over 3 years ago
Comparing a bad run with a good one in the same point (https://openqa.opensuse.org/tests/1847721#step/openqa_webui/1) I checked that in the good one we have a console view instead of graphical view
Updated by okurz over 3 years ago
- Priority changed from Normal to High
- Target version set to Ready
Updated by mkittler over 3 years ago
The test is switching to tty3 and waits for it by looking for the needle inst-console
. After the assert_screen
fails tty3 shows up. So was the SUT just too slow (slower than the timeout of 30 seconds)?
Updated by ilausuch over 3 years ago
- Subject changed from [Sporadic] containers: tests fail with "Test died: no candidate needle with tag(s) 'inst-console' matched" to [Sporadic] containers: tests fail with "Test died: no candidate needle with tag(s) 'inst-console' matched" size:M
- Description updated (diff)
Updated by ilausuch over 3 years ago
I found that since now (in one month) we had 8 occurrences, and more o less one per day
Updated by ilausuch over 3 years ago
- Status changed from Workable to In Progress
The problem I can see here is that the send_key is not working eventually. The send_key is used to change to the console https://github.com/os-autoinst/os-autoinst-distri-openQA/blob/cd7288c4f14bb11ad24155f0a9777c29b2d563c8/tests/install/openqa_webui.pm#L77
[2021-07-19T16:39:25.588 CEST] [debug] no match: 29.0s, best candidate: gnome-desktop-20190509 (0.00)
[2021-07-19T16:39:26.617 CEST] [debug] >>> testapi::_handle_found_needle: found openqa-boot-menu-Tumbleweed-20190329, similarity 1.00 @ 65/11
[2021-07-19T16:39:26.618 CEST] [debug] /tests/install/boot.pm:6 called utils::wait_for_desktop -> lib/utils.pm:20 called testapi::send_key
[2021-07-19T16:39:26.618 CEST] [debug] <<< testapi::send_key(key="ret", wait_screen_change=0, do_wait=0)
[2021-07-19T16:39:26.888 CEST] [debug] /tests/install/boot.pm:6 called utils::wait_for_desktop -> lib/utils.pm:23 called testapi::assert_screen
[2021-07-19T16:39:26.888 CEST] [debug] <<< testapi::assert_screen(mustmatch="openqa-desktop", timeout=500)
[2021-07-19T16:39:27.490 CEST] [debug] no match: 499.4s, best candidate: gnome-desktop-20190509 (0.00)
There are other old cases
https://progress.opensuse.org/issues/72898
https://progress.opensuse.org/issues/88436
https://progress.opensuse.org/issues/89197
I am preparing an solution based on the investigation
Updated by openqa_review over 3 years ago
- Due date set to 2021-08-19
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ilausuch over 3 years ago
Updated by ilausuch over 3 years ago
I want to probe that this solution is good enough launching 40 tests.
If we have all en green we could assume that the solution is ok (but not with all garnaties). On the other hand, if we have a red one failing in this point we can say that 1 minute is enough to change to the terminal screen and therefore the solution is not good and we have to find an other one
Updated by ilausuch over 3 years ago
I launched these tests
1884704, 1884706 - 1884725, 1884733 - 1884753 (exclue 1884705)
Updated by ilausuch over 3 years ago
All tests passed this part of the test.
Few failed in other steps like https://openqa.opensuse.org/tests/1884716
This could be considered as enough probe then
Updated by okurz over 3 years ago
- Status changed from In Progress to Resolved
Thanks. With this information we can regard the initial problem as something temporary that we do not want to fix right now. Although I assume the very same could come back and then we should look into extending the initial timeout.