action #104971: [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #104971

closed

[sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

mkittler

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2022-01-17

Due date:

2022-02-09

% Done:

Estimated time:

Description

Observation¶

See https://github.com/os-autoinst/os-autoinst/runs/4833764360?check_suite_focus=true#step:3:624

I have also seen this locally. count_fail_ratio: Run: 47. Fails: 10. Fail ratio 21%

If I disable the line autotest::loadtest "tests/typing.pm"; in t/data/tests/main.pm then I get 0/50 fails.

Acceptance criteria¶

AC1: typing.pm passes reliably w/ kvm

Suggestions¶

DONE: Confirm if the test fails w/o kvm -> It does't reproduce on arm as per Marius' findings
Side-effect of running kvm whenever it's available which is why we didn't used to see this issue?
Disable parts of typing.pm and identify the problematic part

Files

Download all files

os-autoinst-104971.log (86 KB) os-autoinst-104971.log		mkittler, 2022-01-24 15:46
video.ogv (3.81 MB) video.ogv		mkittler, 2022-01-25 16:29

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by livdywan over 3 years ago

Subject changed from [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" to [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by kraih over 3 years ago

Tried to replicate it locally with a Tumbleweed aarch64 vm, but the test seems pretty stable here.

Actions

Copy link

Updated by mkittler over 3 years ago

Assignee set to mkittler

@kraih That matches my own findings when running the test on ARM workers (see suggestions).

It is hard to tell what's going on with only the logs from the CI job. I'll try to reproduce it locally on x86_64 TW (using KVM). (I also ran the test locally before while developing and didn't encounter problems so far.)

Actions

Copy link

Updated by mkittler over 3 years ago

File os-autoinst-104971.log os-autoinst-104971.log added

Looks like the failure rate here is 5 %:

## count_fail_ratio: Run: 20. Fails: 1. Fail ratio 5%

It also failed in the same way. I've attached the full log and for comparison also the subsequent (successful) test run as not all log messages which look like errors are part of this problem.

Actions

Copy link

Updated by mkittler over 3 years ago

So interestingly, the typing test itself doesn't fail here (despite omitting it apparently helps). Annoyingly the tests bails out on the first failing check so we don't know what other symptoms there are. The test log shows the immediate failure within the reload_needles module:

[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched
[2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)

Actions

Copy link

Updated by mkittler over 3 years ago

Status changed from Workable to In Progress

This is how the test fails:

[2022-01-24T15:50:37.186871+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30)
[2022-01-24T15:50:37.512173+01:00] [debug] no match: 29.6s, best candidate: no-importa-v1 (0.00)
[2022-01-24T15:50:38.508090+01:00] [debug] no match: 28.6s, best candidate: no-importa-v1 (0.96)
…
[2022-01-24T15:51:06.581123+01:00] [debug] no match: 0.6s, best candidate: no-importa-v1 (0.96)
[2022-01-24T15:51:07.586327+01:00] [debug] no match: -0.4s, best candidate: no-importa-v1 (0.96)
[2022-01-24T15:51:07.761684+01:00] [debug] >>> testapi::_check_backend_response: match=no-importa timed out after 30 (assert_screen)
[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched
[2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)

It looks like the required needle matching is only slightly off. For comparison, on the good run the similarity is 0.99:

[2022-01-24T15:53:46.090596+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v1, similarity 0.99 @ 90/350
[2022-01-24T15:53:46.100801+01:00] [debug] git hash in /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles: 7ef72e209567425123d4311781f3024f52d88f3a
[2022-01-24T15:53:46.101036+01:00] [debug] init needles from /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles
[2022-01-24T15:53:46.102547+01:00] [debug] loaded 4 needles
[2022-01-24T15:53:46.103211+01:00] [debug] tests/reload_needles.pm:15 called testapi::assert_screen
[2022-01-24T15:53:46.103352+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30)
[2022-01-24T15:53:47.078561+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v2, similarity 1.00 @ 85/353

That's also not "perfect" for the first needle. Maybe the needle needs adjustment as it is just slightly off? Or the screen looks slightly different sometimes (for whatever reason)?

Actions

Copy link

Updated by mkittler over 3 years ago

File video.ogv video.ogv added

The screen looks slightly different it is. In fact, it looks very distorted, see the attached video.

Now the question is where the distortion comes from. Is that really the image the SUT is producing or a processing error on our side or within QEMU?

So this is not a typing problem. Omitting the typing module just means there's no scrolling of the output and hence avoids the screen distortion.

Actions

Copy link

Updated by okurz over 3 years ago

That we are able to reproduce VNC display corruptions in something as simple as the tinycore based os-autoinst tests is actually great news. This explains why the "typing" makes a difference as without it there is simply less text on the screen and no up-scrolling which is necessary to trigger the issue. https://bugzilla.suse.com/show_bug.cgi?id=1160707 is a bug for VNC console corruption. You might want to comment in there.

Actions

Copy link

Updated by openqa_review over 3 years ago

Due date set to 2022-02-09

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#10

Updated by okurz over 3 years ago

As discussed VNC might not even promise perfect quality. Please research if the seen behaviour is something that is to be expected. Maybe there are parameters which we can tweak to prevent "lossy, fast rendering" and use something like "slower but safer" rendering without screen artifacts :)

Actions

Copy link

#11

Updated by mkittler over 3 years ago

Looking at the standard, it doesn't look like VNC supports such high-level tweaks. However, we could tweak our client to re-request the entire screen via https://datatracker.ietf.org/doc/html/rfc6143#section-7.5.3 in case we're waiting "too long" for a needle.

Actions

Copy link

#12