Project

General

Profile

action #104971

[sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M

Added by okurz 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2022-01-17
Due date:
2022-02-09
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

See https://github.com/os-autoinst/os-autoinst/runs/4833764360?check_suite_focus=true#step:3:624

I have also seen this locally. count_fail_ratio: Run: 47. Fails: 10. Fail ratio 21%

If I disable the line autotest::loadtest "tests/typing.pm"; in t/data/tests/main.pm then I get 0/50 fails.

Acceptance criteria

  • AC1: typing.pm passes reliably w/ kvm

Suggestions

  • DONE: Confirm if the test fails w/o kvm -> It does't reproduce on arm as per Marius' findings
  • Side-effect of running kvm whenever it's available which is why we didn't used to see this issue?
  • Disable parts of typing.pm and identify the problematic part
os-autoinst-104971.log (86 KB) os-autoinst-104971.log mkittler, 2022-01-24 15:46
video.ogv (3.81 MB) video.ogv mkittler, 2022-01-25 16:29

Related issues

Related to openQA Project - action #105429: openQA's fullstack test fails in `shutdown` moduleResolved2022-01-25

History

#1 Updated by cdywan 4 months ago

  • Subject changed from [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" to [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M
  • Description updated (diff)
  • Status changed from New to Workable

#2 Updated by kraih 4 months ago

Tried to replicate it locally with a Tumbleweed aarch64 vm, but the test seems pretty stable here.

#3 Updated by mkittler 4 months ago

  • Assignee set to mkittler

kraih That matches my own findings when running the test on ARM workers (see suggestions).

It is hard to tell what's going on with only the logs from the CI job. I'll try to reproduce it locally on x86_64 TW (using KVM). (I also ran the test locally before while developing and didn't encounter problems so far.)

#4 Updated by mkittler 4 months ago

Looks like the failure rate here is 5 %:

## count_fail_ratio: Run: 20. Fails: 1. Fail ratio 5%

It also failed in the same way. I've attached the full log and for comparison also the subsequent (successful) test run as not all log messages which look like errors are part of this problem.

#5 Updated by mkittler 4 months ago

So interestingly, the typing test itself doesn't fail here (despite omitting it apparently helps). Annoyingly the tests bails out on the first failing check so we don't know what other symptoms there are. The test log shows the immediate failure within the reload_needles module:

[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched
[2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)

#6 Updated by mkittler 4 months ago

  • Status changed from Workable to In Progress

This is how the test fails:

[2022-01-24T15:50:37.186871+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30)
[2022-01-24T15:50:37.512173+01:00] [debug] no match: 29.6s, best candidate: no-importa-v1 (0.00)
[2022-01-24T15:50:38.508090+01:00] [debug] no match: 28.6s, best candidate: no-importa-v1 (0.96)
…
[2022-01-24T15:51:06.581123+01:00] [debug] no match: 0.6s, best candidate: no-importa-v1 (0.96)
[2022-01-24T15:51:07.586327+01:00] [debug] no match: -0.4s, best candidate: no-importa-v1 (0.96)
[2022-01-24T15:51:07.761684+01:00] [debug] >>> testapi::_check_backend_response: match=no-importa timed out after 30 (assert_screen)
[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched
[2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)

It looks like the required needle matching is only slightly off. For comparison, on the good run the similarity is 0.99:

[2022-01-24T15:53:46.090596+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v1, similarity 0.99 @ 90/350
[2022-01-24T15:53:46.100801+01:00] [debug] git hash in /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles: 7ef72e209567425123d4311781f3024f52d88f3a
[2022-01-24T15:53:46.101036+01:00] [debug] init needles from /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles
[2022-01-24T15:53:46.102547+01:00] [debug] loaded 4 needles
[2022-01-24T15:53:46.103211+01:00] [debug] tests/reload_needles.pm:15 called testapi::assert_screen
[2022-01-24T15:53:46.103352+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30)
[2022-01-24T15:53:47.078561+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v2, similarity 1.00 @ 85/353

That's also not "perfect" for the first needle. Maybe the needle needs adjustment as it is just slightly off? Or the screen looks slightly different sometimes (for whatever reason)?

#7 Updated by mkittler 4 months ago

The screen looks slightly different it is. In fact, it looks very distorted, see the attached video.

Now the question is where the distortion comes from. Is that really the image the SUT is producing or a processing error on our side or within QEMU?

So this is not a typing problem. Omitting the typing module just means there's no scrolling of the output and hence avoids the screen distortion.

#8 Updated by okurz 4 months ago

That we are able to reproduce VNC display corruptions in something as simple as the tinycore based os-autoinst tests is actually great news. This explains why the "typing" makes a difference as without it there is simply less text on the screen and no up-scrolling which is necessary to trigger the issue. https://bugzilla.suse.com/show_bug.cgi?id=1160707 is a bug for VNC console corruption. You might want to comment in there.

#9 Updated by openqa_review 4 months ago

  • Due date set to 2022-02-09

Setting due date based on mean cycle time of SUSE QE Tools

#10 Updated by okurz 4 months ago

As discussed VNC might not even promise perfect quality. Please research if the seen behaviour is something that is to be expected. Maybe there are parameters which we can tweak to prevent "lossy, fast rendering" and use something like "slower but safer" rendering without screen artifacts :)

#11 Updated by mkittler 4 months ago

Looking at the standard, it doesn't look like VNC supports such high-level tweaks. However, we could tweak our client to re-request the entire screen via https://datatracker.ietf.org/doc/html/rfc6143#section-7.5.3 in case we're waiting "too long" for a needle.

#12 Updated by okurz 4 months ago

  • Related to action #105429: openQA's fullstack test fails in `shutdown` module added

#13 Updated by mkittler 3 months ago

  • Status changed from In Progress to Feedback

#14 Updated by mkittler 3 months ago

  • Status changed from Feedback to Resolved

The PR has been merged. I've already checked before merging it whether it makes a difference and it does. So I suppose the issue can be resolved.

Also available in: Atom PDF