action #104971
closed[sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M
Added by okurz almost 3 years ago. Updated almost 3 years ago.
Description
Observation¶
See https://github.com/os-autoinst/os-autoinst/runs/4833764360?check_suite_focus=true#step:3:624
I have also seen this locally. count_fail_ratio: Run: 47. Fails: 10. Fail ratio 21%
If I disable the line autotest::loadtest "tests/typing.pm";
in t/data/tests/main.pm then I get 0/50 fails.
Acceptance criteria¶
- AC1: typing.pm passes reliably w/ kvm
Suggestions¶
- DONE: Confirm if the test fails w/o kvm -> It does't reproduce on arm as per Marius' findings
- Side-effect of running kvm whenever it's available which is why we didn't used to see this issue?
- Disable parts of typing.pm and identify the problematic part
Files
os-autoinst-104971.log (86 KB) os-autoinst-104971.log | mkittler, 2022-01-24 15:46 | ||
video.ogv (3.81 MB) video.ogv | mkittler, 2022-01-25 16:29 |
Updated by livdywan almost 3 years ago
- Subject changed from [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" to [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by kraih almost 3 years ago
Tried to replicate it locally with a Tumbleweed aarch64 vm, but the test seems pretty stable here.
Updated by mkittler almost 3 years ago
- Assignee set to mkittler
@kraih That matches my own findings when running the test on ARM workers (see suggestions).
It is hard to tell what's going on with only the logs from the CI job. I'll try to reproduce it locally on x86_64 TW (using KVM). (I also ran the test locally before while developing and didn't encounter problems so far.)
Updated by mkittler almost 3 years ago
- File os-autoinst-104971.log os-autoinst-104971.log added
Looks like the failure rate here is 5 %:
## count_fail_ratio: Run: 20. Fails: 1. Fail ratio 5%
It also failed in the same way. I've attached the full log and for comparison also the subsequent (successful) test run as not all log messages which look like errors are part of this problem.
Updated by mkittler almost 3 years ago
So interestingly, the typing test itself doesn't fail here (despite omitting it apparently helps). Annoyingly the tests bails out on the first failing check so we don't know what other symptoms there are. The test log shows the immediate failure within the reload_needles
module:
[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched
[2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)
Updated by mkittler almost 3 years ago
- Status changed from Workable to In Progress
This is how the test fails:
[2022-01-24T15:50:37.186871+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30)
[2022-01-24T15:50:37.512173+01:00] [debug] no match: 29.6s, best candidate: no-importa-v1 (0.00)
[2022-01-24T15:50:38.508090+01:00] [debug] no match: 28.6s, best candidate: no-importa-v1 (0.96)
…
[2022-01-24T15:51:06.581123+01:00] [debug] no match: 0.6s, best candidate: no-importa-v1 (0.96)
[2022-01-24T15:51:07.586327+01:00] [debug] no match: -0.4s, best candidate: no-importa-v1 (0.96)
[2022-01-24T15:51:07.761684+01:00] [debug] >>> testapi::_check_backend_response: match=no-importa timed out after 30 (assert_screen)
[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched
[2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)
It looks like the required needle matching is only slightly off. For comparison, on the good run the similarity is 0.99:
[2022-01-24T15:53:46.090596+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v1, similarity 0.99 @ 90/350
[2022-01-24T15:53:46.100801+01:00] [debug] git hash in /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles: 7ef72e209567425123d4311781f3024f52d88f3a
[2022-01-24T15:53:46.101036+01:00] [debug] init needles from /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles
[2022-01-24T15:53:46.102547+01:00] [debug] loaded 4 needles
[2022-01-24T15:53:46.103211+01:00] [debug] tests/reload_needles.pm:15 called testapi::assert_screen
[2022-01-24T15:53:46.103352+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30)
[2022-01-24T15:53:47.078561+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v2, similarity 1.00 @ 85/353
That's also not "perfect" for the first needle. Maybe the needle needs adjustment as it is just slightly off? Or the screen looks slightly different sometimes (for whatever reason)?
Updated by mkittler almost 3 years ago
The screen looks slightly different it is. In fact, it looks very distorted, see the attached video.
Now the question is where the distortion comes from. Is that really the image the SUT is producing or a processing error on our side or within QEMU?
So this is not a typing problem. Omitting the typing module just means there's no scrolling of the output and hence avoids the screen distortion.
Updated by okurz almost 3 years ago
That we are able to reproduce VNC display corruptions in something as simple as the tinycore based os-autoinst tests is actually great news. This explains why the "typing" makes a difference as without it there is simply less text on the screen and no up-scrolling which is necessary to trigger the issue. https://bugzilla.suse.com/show_bug.cgi?id=1160707 is a bug for VNC console corruption. You might want to comment in there.
Updated by openqa_review almost 3 years ago
- Due date set to 2022-02-09
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz almost 3 years ago
As discussed VNC might not even promise perfect quality. Please research if the seen behaviour is something that is to be expected. Maybe there are parameters which we can tweak to prevent "lossy, fast rendering" and use something like "slower but safer" rendering without screen artifacts :)
Updated by mkittler almost 3 years ago
Looking at the standard, it doesn't look like VNC supports such high-level tweaks. However, we could tweak our client to re-request the entire screen via https://datatracker.ietf.org/doc/html/rfc6143#section-7.5.3 in case we're waiting "too long" for a needle.
Updated by okurz almost 3 years ago
- Related to action #105429: openQA's fullstack test fails in `shutdown` module added
Updated by mkittler almost 3 years ago
- Status changed from In Progress to Feedback
Waiting for feedback on the PR: https://github.com/os-autoinst/os-autoinst/pull/1939
Updated by mkittler almost 3 years ago
- Status changed from Feedback to Resolved
The PR has been merged. I've already checked before merging it whether it makes a difference and it does. So I suppose the issue can be resolved.