Project

General

Profile

Actions

action #37907

closed

[sle][functional][hyperv][u] Various test failures on hyperv - stabilize

Added by zluo almost 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
michalnowak
Category:
Bugs in existing tests
Target version:
-
Start date:
2018-06-27
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

I found a lot of installation tests on hyperV host failed. I found the performance on hyperV host is extremly low. It is not suitable to run openQA test to me.

TIMEOUT_SCALE=3 already used for the test. MAX_JOB_TIME seems to be too less: 14400

Suggestion: exclude tests on hyperv host

Observation

openQA test in scenario sle-12-SP4-Server-DVD-x86_64-lvm+RAID1@svirt-hyperv-uefi fails in
start_install

Reproducible

Fails since (at least) Build 0266 (current job)

Expected result

Last good: 0265 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #35044: [sle][functional][u][hyperv][easy] textmode@svirt-hyperv-uefi exceeds max job time - 3hResolvedokurz2018-04-162018-07-03

Actions
Blocks openQA Tests - action #32458: [functional][hyperv][hard][u] test fails in consoletest_setup - switch from X11 to VT is not supportedResolvedmichalnowak

Actions
Actions #1

Updated by okurz almost 6 years ago

  • Assignee set to michalnowak

I don't think this is related to "performance issue in general", see the log in autoinst-log.txt:

[2018-06-26T15:37:10.0527 CEST] [debug] no match: 25230.6s
[2018-06-26T15:37:11.0396 CEST] [debug] no change: 25229.6s
[2018-06-26T15:37:12.0394 CEST] [debug] GET "/P__ZgQU_oeOurXFR/isotovideo/status"
[2018-06-26T15:37:12.0394 CEST] [debug] Routing to a callback
[2018-06-26T15:37:12.0395 CEST] [debug] no change: 25228.6s
[2018-06-26T15:37:12.0395 CEST] [debug] 200 OK (0.001047s, 955.110/s)
[2018-06-26T15:37:13.0396 CEST] [debug] no change: 25227.6s
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":48153"
      after 4595 requests (4595 known processed) with 0 events remaining.
[2018-06-26T15:37:13.0949 CEST] [debug] backend got TERM
[2018-06-26T15:37:13.0949 CEST] [debug] autotest received signal TERM, saving results of current test before exiting

so what I see there is that the test module is waiting for the test to continue, then the X-server on the worker dies, the backend terminates and then the job stays dangling until the job times out.

https://openqa.suse.de/tests/1786205/file/worker-log.txt mentions just

[2018-06-26T12:37:12.0916 CEST] [info] 5139: WORKING 1786205
[2018-06-26T15:37:12.0900 CEST] [warn] max job time exceeded, aborting 01786205-sle-12-SP4-Server-DVD-x86_64-Build0266-lvm+RAID1@svirt-hyperv-uefi

Currently we do have a new hyperv machine that can make us hope that tests will be more reliable and stable. Regardless, I strongly suggest to ensure that the incomplete is turned into a failure in this case. @michalnowak can you provide some insight and ideas, e.g. should we still see these tests on the old machine, new machine, what to do about the "incomplete". Maybe check further logs on the worker?

Actions #2

Updated by okurz almost 6 years ago

  • Subject changed from [sle][functional][hyperv][u] test stopped -- could be performance issue in general to [sle][functional][hyperv][u] Various test failures on hyperv - stabilize

@michalnowak I suggest that we move all hyperv scenarios from SLE12SP4 to test development and stabilize tests there. You will not be left alone with this. We have defined an "expert" in the domain of "hyperv", which is @oorlov, see https://wiki.microfocus.net/index.php/RD-OPS_QA/openQA_review#Functional.2Bstaging

So can you move the test scenarios from SLE12SP4 to "test development", make sure all failures are labeled and then QSF will help to stabilize the scenarios before moving back? Or should we take this task?

Actions #3

Updated by okurz almost 6 years ago

  • Priority changed from Normal to Urgent
Actions #4

Updated by okurz almost 6 years ago

  • Blocks action #32458: [functional][hyperv][hard][u] test fails in consoletest_setup - switch from X11 to VT is not supported added
Actions #5

Updated by okurz almost 6 years ago

  • Related to action #35044: [sle][functional][u][hyperv][easy] textmode@svirt-hyperv-uefi exceeds max job time - 3h added
Actions #6

Updated by michalnowak almost 6 years ago

  • Status changed from New to In Progress

The job was terminated exactly after three hours. I think the timeout came for the job.

XIO: fatal IO error 11 is just a sign of the backend being terminated.

Can't seek the video, so I can't say for sure, but the log suggest the screen was changing before the termination.

So, a performance issue with the old Flexo server.

However, we have a new Hyper-V server already deployed to OSD and experience show that the performance gap is gone (among other things): e.g. lvm+RAID1@svirt-hyperv (https://openqa.suse.de/tests/1788719#next_previous) is down from 2h20m to 37m.

Also the overall stability now match the rest of OSD:

I believe the ticket is thus resolved.

Actions #7

Updated by okurz almost 6 years ago

  • Status changed from In Progress to Resolved

Yes, I think so as well. Latest build 0271 also seems to be very stable: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=0271&groupid=139&machine=svirt-hyperv&machine=svirt-hyperv-uefi all green :)

Ok, so let's close and hope we are done. Thank you for your fast reaction.

Actions

Also available in: Atom PDF