[sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M
I have also seen this locally. count_fail_ratio: Run: 47. Fails: 10. Fail ratio 21%
If I disable the line
autotest::loadtest "tests/typing.pm"; in t/data/tests/main.pm then I get 0/50 fails.
- AC1: typing.pm passes reliably w/ kvm
- DONE: Confirm if the test fails w/o kvm -> It does't reproduce on arm as per Marius' findings
- Side-effect of running kvm whenever it's available which is why we didn't used to see this issue?
- Disable parts of typing.pm and identify the problematic part
- Subject changed from [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" to [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M
- Description updated (diff)
- Status changed from New to Workable
- Assignee set to mkittler
kraih That matches my own findings when running the test on ARM workers (see suggestions).
It is hard to tell what's going on with only the logs from the CI job. I'll try to reproduce it locally on x86_64 TW (using KVM). (I also ran the test locally before while developing and didn't encounter problems so far.)
Looks like the failure rate here is 5 %:
## count_fail_ratio: Run: 20. Fails: 1. Fail ratio 5%
It also failed in the same way. I've attached the full log and for comparison also the subsequent (successful) test run as not all log messages which look like errors are part of this problem.
So interestingly, the typing test itself doesn't fail here (despite omitting it apparently helps). Annoyingly the tests bails out on the first failing check so we don't know what other symptoms there are. The test log shows the immediate failure within the
[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched [2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)
- Status changed from Workable to In Progress
This is how the test fails:
[2022-01-24T15:50:37.186871+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30) [2022-01-24T15:50:37.512173+01:00] [debug] no match: 29.6s, best candidate: no-importa-v1 (0.00) [2022-01-24T15:50:38.508090+01:00] [debug] no match: 28.6s, best candidate: no-importa-v1 (0.96) … [2022-01-24T15:51:06.581123+01:00] [debug] no match: 0.6s, best candidate: no-importa-v1 (0.96) [2022-01-24T15:51:07.586327+01:00] [debug] no match: -0.4s, best candidate: no-importa-v1 (0.96) [2022-01-24T15:51:07.761684+01:00] [debug] >>> testapi::_check_backend_response: match=no-importa timed out after 30 (assert_screen) [2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched [2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)
It looks like the required needle matching is only slightly off. For comparison, on the good run the similarity is 0.99:
[2022-01-24T15:53:46.090596+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v1, similarity 0.99 @ 90/350 [2022-01-24T15:53:46.100801+01:00] [debug] git hash in /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles: 7ef72e209567425123d4311781f3024f52d88f3a [2022-01-24T15:53:46.101036+01:00] [debug] init needles from /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles [2022-01-24T15:53:46.102547+01:00] [debug] loaded 4 needles [2022-01-24T15:53:46.103211+01:00] [debug] tests/reload_needles.pm:15 called testapi::assert_screen [2022-01-24T15:53:46.103352+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30) [2022-01-24T15:53:47.078561+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v2, similarity 1.00 @ 85/353
That's also not "perfect" for the first needle. Maybe the needle needs adjustment as it is just slightly off? Or the screen looks slightly different sometimes (for whatever reason)?
The screen looks slightly different it is. In fact, it looks very distorted, see the attached video.
Now the question is where the distortion comes from. Is that really the image the SUT is producing or a processing error on our side or within QEMU?
So this is not a typing problem. Omitting the typing module just means there's no scrolling of the output and hence avoids the screen distortion.
That we are able to reproduce VNC display corruptions in something as simple as the tinycore based os-autoinst tests is actually great news. This explains why the "typing" makes a difference as without it there is simply less text on the screen and no up-scrolling which is necessary to trigger the issue. https://bugzilla.suse.com/show_bug.cgi?id=1160707 is a bug for VNC console corruption. You might want to comment in there.
As discussed VNC might not even promise perfect quality. Please research if the seen behaviour is something that is to be expected. Maybe there are parameters which we can tweak to prevent "lossy, fast rendering" and use something like "slower but safer" rendering without screen artifacts :)
Looking at the standard, it doesn't look like VNC supports such high-level tweaks. However, we could tweak our client to re-request the entire screen via https://datatracker.ietf.org/doc/html/rfc6143#section-7.5.3 in case we're waiting "too long" for a needle.
- Status changed from In Progress to Feedback
Waiting for feedback on the PR: https://github.com/os-autoinst/os-autoinst/pull/1939