action #104971: [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M - openQA Project (public) - openSUSE Project Management Tool

Custom queries

All 'new' issues w/o assignee, sorted by version/priority
All auto_review tickets
All auto_review+force_result tickets
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QE tools team - backlog (dev)
QE tools team - backlog (ready issues)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE Tools Team - Beginner
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE Tools team - due soon
QE tools team - exceeding due-date
QE Tools Team - Expert
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (dev)
QE tools team - non-estimated (unblocked) issues (infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - SLO high forecast
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

action #104971

closed

[sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

mkittler

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2022-01-17

Due date:

2022-02-09

% Done:

Estimated time:

Description

Observation¶

See https://github.com/os-autoinst/os-autoinst/runs/4833764360?check_suite_focus=true#step:3:624

I have also seen this locally. count_fail_ratio: Run: 47. Fails: 10. Fail ratio 21%

If I disable the line autotest::loadtest "tests/typing.pm"; in t/data/tests/main.pm then I get 0/50 fails.

Acceptance criteria¶

AC1: typing.pm passes reliably w/ kvm

Suggestions¶

DONE: Confirm if the test fails w/o kvm -> It does't reproduce on arm as per Marius' findings
Side-effect of running kvm whenever it's available which is why we didn't used to see this issue?
Disable parts of typing.pm and identify the problematic part

Files

Download all files

os-autoinst-104971.log (86 KB) os-autoinst-104971.log		mkittler, 2022-01-24 15:46
video.ogv (3.81 MB) video.ogv		mkittler, 2022-01-25 16:29

Related issues 1 (0 open — 1 closed)

Related to openQA Project (public) - action #105429: openQA's fullstack test fails in `shutdown` module

Resolved

mkittler

2022-01-25

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by livdywan over 3 years ago

Subject changed from [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" to [sporadic] os-autoinst t/99-full-stack.t sporadically fails in "Result in testresults/result-reload_needles.json is ok" size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by kraih over 3 years ago

Tried to replicate it locally with a Tumbleweed aarch64 vm, but the test seems pretty stable here.

Actions

Copy link

Updated by mkittler over 3 years ago

Assignee set to mkittler

@kraih That matches my own findings when running the test on ARM workers (see suggestions).

It is hard to tell what's going on with only the logs from the CI job. I'll try to reproduce it locally on x86_64 TW (using KVM). (I also ran the test locally before while developing and didn't encounter problems so far.)

Actions

Copy link

Updated by mkittler over 3 years ago

File os-autoinst-104971.log os-autoinst-104971.log added

Looks like the failure rate here is 5 %:

## count_fail_ratio: Run: 20. Fails: 1. Fail ratio 5%

It also failed in the same way. I've attached the full log and for comparison also the subsequent (successful) test run as not all log messages which look like errors are part of this problem.

Actions

Copy link

Updated by mkittler over 3 years ago

So interestingly, the typing test itself doesn't fail here (despite omitting it apparently helps). Annoyingly the tests bails out on the first failing check so we don't know what other symptoms there are. The test log shows the immediate failure within the reload_needles module:

[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched
[2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)

Actions

Copy link

Updated by mkittler over 3 years ago

Status changed from Workable to In Progress

This is how the test fails:

[2022-01-24T15:50:37.186871+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30)
[2022-01-24T15:50:37.512173+01:00] [debug] no match: 29.6s, best candidate: no-importa-v1 (0.00)
[2022-01-24T15:50:38.508090+01:00] [debug] no match: 28.6s, best candidate: no-importa-v1 (0.96)
…
[2022-01-24T15:51:06.581123+01:00] [debug] no match: 0.6s, best candidate: no-importa-v1 (0.96)
[2022-01-24T15:51:07.586327+01:00] [debug] no match: -0.4s, best candidate: no-importa-v1 (0.96)
[2022-01-24T15:51:07.761684+01:00] [debug] >>> testapi::_check_backend_response: match=no-importa timed out after 30 (assert_screen)
[2022-01-24T15:51:07.799706+01:00] [info] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'no-importa' matched
[2022-01-24T15:51:07.801392+01:00] [debug] ||| finished reload_needles tests (runtime: 31 s)

It looks like the required needle matching is only slightly off. For comparison, on the good run the similarity is 0.99:

[2022-01-24T15:53:46.090596+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v1, similarity 0.99 @ 90/350
[2022-01-24T15:53:46.100801+01:00] [debug] git hash in /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles: 7ef72e209567425123d4311781f3024f52d88f3a
[2022-01-24T15:53:46.101036+01:00] [debug] init needles from /hdd/openqa-devel/repos/os-autoinst/t/data//tests/needles
[2022-01-24T15:53:46.102547+01:00] [debug] loaded 4 needles
[2022-01-24T15:53:46.103211+01:00] [debug] tests/reload_needles.pm:15 called testapi::assert_screen
[2022-01-24T15:53:46.103352+01:00] [debug] <<< testapi::assert_screen(mustmatch="no-importa", timeout=30)
[2022-01-24T15:53:47.078561+01:00] [debug] >>> testapi::_handle_found_needle: found no-importa-v2, similarity 1.00 @ 85/353

That's also not "perfect" for the first needle. Maybe the needle needs adjustment as it is just slightly off? Or the screen looks slightly different sometimes (for whatever reason)?

Actions

Copy link

Updated by mkittler over 3 years ago

File video.ogv video.ogv added

The screen looks slightly different it is. In fact, it looks very distorted, see the attached video.

Now the question is where the distortion comes from. Is that really the image the SUT is producing or a processing error on our side or within QEMU?

So this is not a typing problem. Omitting the typing module just means there's no scrolling of the output and hence avoids the screen distortion.

Actions

Copy link

Updated by okurz over 3 years ago

That we are able to reproduce VNC display corruptions in something as simple as the tinycore based os-autoinst tests is actually great news. This explains why the "typing" makes a difference as without it there is simply less text on the screen and no up-scrolling which is necessary to trigger the issue. https://bugzilla.suse.com/show_bug.cgi?id=1160707 is a bug for VNC console corruption. You might want to comment in there.

Actions

Copy link

Updated by openqa_review over 3 years ago

Due date set to 2022-02-09

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#10

Updated by okurz over 3 years ago

As discussed VNC might not even promise perfect quality. Please research if the seen behaviour is something that is to be expected. Maybe there are parameters which we can tweak to prevent "lossy, fast rendering" and use something like "slower but safer" rendering without screen artifacts :)

Actions

Copy link

#11

Updated by mkittler over 3 years ago

Looking at the standard, it doesn't look like VNC supports such high-level tweaks. However, we could tweak our client to re-request the entire screen via https://datatracker.ietf.org/doc/html/rfc6143#section-7.5.3 in case we're waiting "too long" for a needle.

Actions

Copy link

#12