Project

General

Profile

action #18016

[sles][migration][s390x] find proper way of handling image creation for migration on zKVM

Added by qmsu over 4 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
2017-03-27
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-12-SP3-Server-DVD-s390x-migration_zdup_offline_sle12sp2_allpatterns_zkvm@zkvm fails in
setup_zdup

Following error can be seen in log https://openqa.suse.de/tests/838567/file/autoinst-log.txt

04:28:54.1894 Debug: /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/setup_zdup.pm:22 called opensusebasetest::wait_boot
04:28:54.1895 20233 <<< testapi::select_console(testapi_console='x11')
/usr/lib/os-autoinst/consoles/vnc_base.pm:57:{
'password' => 'nots3cr3t',
'hostname' => '10.161.145.3',
'port' => 5901
}
04:28:56.1957 20234 Error connecting to host : IO::Socket::INET: connect: No route to host
04:28:57.1970 20234 Error connecting to host : IO::Socket::INET: connect: No route to host
04:28:58.1982 20234 Error connecting to host : IO::Socket::INET: connect: No route to host
04:28:59.1997 20234 Error connecting to host : IO::Socket::INET: connect: No route to host
04:29:00.2010 20234 Error connecting to host : IO::Socket::INET: connect: No route to host
04:29:01.2022 20234 Error connecting to host : IO::Socket::INET: connect: No route to host
04:29:02.2034 20234 Error connecting to host : IO::Socket::INET: connect: No route to host
04:29:03.2047 20234 Error connecting to host : IO::Socket::INET: connect: No route to host
DIE Can't call method "blocking" on an undefined value at /usr/lib/os-autoinst/consoles/VNC.pm line 864.

Reproducible

Fails since (at least) Build 0297 (current job)

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest

dump.diff (35 KB) dump.diff mgriessmeier, 2017-08-29 08:06

Related issues

Blocks openQA Tests - action #13216: [sles][functional][s390x] Run extratest on s390xResolved2017-03-02

History

#1 Updated by okurz@suse.de over 4 years ago

mgriessmeier,

action #18016: [sles][migration][s390x] test failes in setup_zdup because
zkvm fails to select x11 console https://progress.opensuse.org/issues/18016
[…]
https://openqa.suse.de/tests/838567

would that require an additional reset_consoles here? isn't that the common
cause of "incomplete" when it should rather be caught in the test and fail
with a helpful error message?

mgriessmeier, can you try to put in a testapi::record_info call if this
happens and ensure there is no crash file written so that the test aborts with
failed and not incomplete

#2 Updated by mgriessmeier over 4 years ago

  • Status changed from New to In Progress
  • Assignee set to mgriessmeier

okurz@suse.de wrote:

mgriessmeier,

action #18016: [sles][migration][s390x] test failes in setup_zdup because
zkvm fails to select x11 console https://progress.opensuse.org/issues/18016
[…]
https://openqa.suse.de/tests/838567

The problem is, that the image was not created on the right worker - I'm doing this right now and will upload the new image after that
For the future:

  • make sure that every image which you want to create with PUBLISH_HDD_1 is running on WORKER_CLASS=zkvm-image as well as the corresponding upgrade job (also needs to have WORKER_CLASS=zkvm-image

would that require an additional reset_consoles here? isn't that the common
cause of "incomplete" when it should rather be caught in the test and fail
with a helpful error message?

mgriessmeier, can you try to put in a testapi::record_info call if this
happens and ensure there is no crash file written so that the test aborts with
failed and not incomplete

I'll put it on my todo list

#3 Updated by mgriessmeier over 4 years ago

  • Re-Added this image created by the correct worker
  • Changed all Migration tests in test-development to use WORKER_CLASS zkvm-image Should be working with the next build

#4 Updated by qmsu over 4 years ago

mgriessmeier

I see the changes, thanks.
I will check the results of Migration tests in test-development on next build to confirm it works.

Actually we need prepare more s390x hdd images for zdup_offline/online migration tests (i.e. sle12sp1+sdk, sle12sp1+ha+geo, ... sle12sp2+sdk, sle12sp2+ha+geo, etc).
So would you please send me the parameters you posted the job to create this sle12sp2 hdd image? Then I can generate all required images by myself.

Thanks.

#5 Updated by mgriessmeier over 4 years ago

Hi,

So the general approach would be to clone the corresponding job (e.g. sle12sp1+sdk) from openqa.suse.de to openqa.suse.de and add the PUBLISH_HDD_1 variable

I did it like this:
/usr/share/openqa/script/clone_job.pl --host https://openqa.suse.de --from https://openqa.suse.de $JOB_ID INSTALLONLY=1 WORKER_CLASS=zkvm-image PUBLISH_HDD_1=$HDD_IMAGE_NAME _GROUP=0

NOTES:
Using INSTALLONLY=1 is enough for the creation of the image, no need for consoletests
Using WORKER_CLASS=zkvm-images is mandatory because otherwise the IPs are not matching (will hopefully be fixed in the future)
Using _GROUP=0 is highly recommended, because it ensures that the creation job will not pollute any existing job group

#6 Updated by qmsu over 4 years ago

Got it. Thanks.
I will try it.

#7 Updated by mgriessmeier over 4 years ago

  • Subject changed from [sles][migration][s390x] test failes in setup_zdup because zkvm fails to select x11 console to [sles][migration][s390x] find proper way of handling image creation for migration on zKVM
  • Category changed from Bugs in existing tests to Infrastructure

Changed subject - since the original ticket was caused by this - but we should track it

So for now, all the zKVM guests use static ip adresses, that's why we need a dedicated workerclass for it to ensure that the created image can be booted correctly
This is bad in several points:

  1. we can only run one migration at one time
  2. we need to ensure that the image is always created on the correct worker

Suggestion:
Use a proper dhcp setup on s390pb to avoid this issue
=> I already created a ticket to infra@suse.de for this:
https://infra.nue.suse.com/Ticket/Display.html?id=66714

Let's use this ticket for tracking this

#8 Updated by mgriessmeier over 4 years ago

  • Blocks action #13216: [sles][functional][s390x] Run extratest on s390x added

#9 Updated by mgriessmeier over 4 years ago

  • Assignee deleted (mgriessmeier)

not working on this right now, image-creation is working fine, if the image gets created on the right worker, which is handled pretty well by all people at the moment

Unassigning for now, feel free to ask if you plan to work on this

#10 Updated by okurz over 4 years ago

  • Status changed from In Progress to Feedback
  • Assignee set to mgriessmeier

https://infra.nue.suse.com/Ticket/Display.html?id=66714

I put a friendly bump in that infra ticket.

mgriessmeier I assume you have no problem tracking that ticket in "feedback" status now as you also now gschlotter personally.

#11 Updated by okurz about 4 years ago

  • Status changed from Feedback to In Progress
  • Assignee changed from mgriessmeier to okurz

#12 Updated by okurz about 4 years ago

https://openqa.suse.de/tests/1081373#step/patch_before_migration/54 running good so far on o.s.d on a different worker, but this costs us 6 minutes of useless waiting :/

#13 Updated by okurz about 4 years ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3356 for the enhancement, merged, synced to osd.

verification job on osd triggered: https://openqa.suse.de/tests/1081424#live, waiting

EDIT: Failed because couldn't find http://openqa.suse.de/assets/repo/SLE-12-SP3-SERVER-POOL-s390x-Build0473-Media1/ . Seems the repo is already cleaned up. IMHO too much effort to make it work for SLE12SP3 right now. Let's just assume it works and continue, ok?

I guess the next step would be to get rid of the zkvm-images worker class in all job schedules.

#15 Updated by mgriessmeier about 4 years ago

okurz wrote:

mgriessmeier: Do we still need https://infra.nue.suse.com/SelfService/Display.html?id=66714 ?

nope - I commented in the ticket and suggested to close it

#16 Updated by okurz about 4 years ago

waiting for riafarov and me to rework the templates at first for sle15, then we can adapt this step as well

#17 Updated by mgriessmeier about 4 years ago

all occurences of zkvm-image got replaced to use the machine 'zkvm'
see attached dump.diff

see also PR for removing the worker_class from the workers.ini:
https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/51

#18 Updated by okurz about 4 years ago

  • Status changed from In Progress to Resolved

MR merged. I also did not find any left over references of zkvm-image(s) in neither os-autoinst nor our tests so we should be done here.

#19 Updated by pvorel over 1 year ago

I consider
multiple calls of save_svirt_pty as a bug: it slows down testing,
see LTP tests on s390x (svirt backend):

https://openqa.suse.de/tests/3766791#step/boot_ltp/9

BTW: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9290

Also available in: Atom PDF