Project

General

Profile

action #9580

action #9570: Boot to Snapshot

Boot to snapshot after upgrade and then rollback

Added by RBrownSUSE over 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
New test
Start date:
2015-11-17
Due date:
% Done:

100%

Estimated time:
Difficulty:
Duration:

Description

A little more complicated than testing boot to snapshot - as we want to check that we can rollback

tasks

  • come up with a way to match SP1/GA needles (the rollback target) within the SP2 test (the rollback source)
  • @waitfor image generation jobs working again: fix ppc64le issues

Checklist

  • SLE

Related issues

Blocks openQA Project - action #13156: os-autoinst: Add support to easily switch VERSION during a test runResolved2016-08-11

Copied to openQA Tests - action #12964: [opensuse][functional][u] Boot to snapshot after upgrade and then rollbackWorkable

History

#1 Updated by RBrownSUSE over 4 years ago

  • Assignee set to dmaiocchi

Assigning to Dario as he's volunteered

#2 Updated by RBrownSUSE over 4 years ago

  • Checklist set to [ ] SLE, [ ] TW, [ ] Leap
  • Target version changed from 154 to 162

#3 Updated by dmaiocchi over 4 years ago

  • % Done changed from 0 to 50

#4 Updated by dmaiocchi over 4 years ago

for this task at moment i have made 2 tests.

1 boot_to_snapshot

2 snapper_rollback.

In this way all is more scalable, and we test 2 different things. We can in this way test boot_snapshot_ without a migration situation.

#5 Updated by dmaiocchi over 4 years ago

1) done . boot_to_snapshot merged. test run without problems in production.

2) snapper rollback -> WIP
2a) adapt grub_test for booting on snapshot before migration if Upgrade is selected

      2b) add rollback_test in main.pm and work on the pm itself

#6 Updated by okurz over 4 years ago

  • Target version changed from 162 to Milestone 1

#7 Updated by dmaiocchi over 4 years ago

  • Status changed from New to Feedback
  • % Done changed from 50 to 90

pr created. feedback phase

#8 Updated by dmaiocchi over 4 years ago

todo: fix ppc needles

#9 Updated by RBrownSUSE about 4 years ago

  • Status changed from Feedback to In Progress
  • % Done changed from 90 to 70

Please document the changes you want to see made to the production system in order to test this.

In the call today you said 'just add Upgrade=1' but we have lots of tests with upgrade=1 already set and they are not running this test

http://openqa.suse.de/tests/342503

I have gone back to the pull request and cannot find any mention of what you want changed there either... I'm happy to setup whatever jobs you need, but I need to be told what you need.

#10 Updated by dmaiocchi about 4 years ago

H Richard,

so in the main.pm

    if ((snapper_is_applicable) && get_var("BOOT_TO_SNAPSHOT")) {
        loadtest "installation/boot_into_snapshot.pm";
        if (get_var("UPGRADE")) {
            loadtest "installation/snapper_rollback.pm";

for make the tests "snapper_rollback" variable needed to 1 set : "boot_to_snapshot" and "upgrade", and the snapper_is_applicable function has to return 1

 sub snapper_is_applicable() {
   my $fs = get_var("FILESYSTEM", 'btrfs');
   return ($fs eq "btrfs" && get_var("HDDSIZEGB", 10) > 10);
}

Best

#11 Updated by RBrownSUSE about 4 years ago

So are you proposing that I create about a dozen all new test cases for all the migration scenarios to also test boot to snapshot and rollback?

Or are you suggesting I add boot to snapshot testing to all the migration scenarios?

I'm still not sure what you're expecting me to do...

#12 Updated by dmaiocchi about 4 years ago

Well, i didn't know either, when i got the task, in which Releases/os it will be run.

I was thinking that the boot into snapshot has to be tested in all the version that make a migration and support btrfs & snapper.

and the snapper rollback, should be tested when a system make a migration with btrfs and snapper.

I'm speaking here about the real test case.

I know that it will take some times additional to run, maybe 4-5 minutes are going to be added from the "production" testing in openqa. Well this is another situation, and another problem. Honestly i don't know even all the matrix scenarios for openqa.

We can add this maybe for one migration, sp1-> SP2 at begin and see, before adding this test to whole production

#13 Updated by RBrownSUSE about 4 years ago

Part of the task is an opportunity for you to define answers to all of those questions :)

Honestly i don't know even all the matrix scenarios for openqa.

You have the SLE 12 PRD Document so you should know what is intended to be supported, you can see openqa.suse.de and all the source for os-autoinst-distri-opensuse so you can understand the current matrix. What more information do you require?

We can add this maybe for one migration, sp1-> SP2 at begin and see, before adding this test to whole production

Sounds like a good idea, will start that way - but I'm still worried about the suggestion you don't have a full picture..so please answer my above question too :)

#14 Updated by dmaiocchi about 4 years ago

Please define a testing strategy according with that i can try to modify the test.

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/1270

Thank you in advance.

Scenarios that we consider :

a) do rollaback after migration, then continue testing the migrated system.

--> Problem: (the system before migration with the snapshosht, backup system is not set)
--> Advantage: not really difficult to maintain.

b) Have a scenario after a regular migration test <- this approach takes longer, but has the added benefit of testing the migration of ga/sp1 to sp2 and fully validating the distribution, then rolling back that exact image and fully re-validating the distribution on the earlier version

[didn't understand this. could be like c) with a snapshot image]

--> Problem:
--> Benifits:

c) create an image for system with snapshots as backup. and do the distro-openqa testing on that.

d)....

--> DECISION:::

#15 Updated by maritawerner about 4 years ago

Hi, I will give input here once I find time.

#16 Updated by dmaiocchi about 4 years ago

i agree that a 'detailed test' is a nice solution.

a 'detailed test', boot to a snapshot of a migrated system and
rollback to before the migration and confirm the migrated system is a
valid sle12sp1 or sle12ga server using all the usual acceptance test
suites

i will list what is not clear for me:

test that the rollback functionality after migration works
properly, so that users can rollback to sle12sp1 or sle12ga after
upgrading to sle12sp2

a) so this mean that the boot_into_snapshot test and the rollback should be enabled after the X11 tests of the migrated system ?

if we have that, then we should rerun the grub2 test, because in there is the process to boot_into_snapshot.

so this could be an example of a job of the matrix:

``
Sles12-sp1.
installation --> after that we are on Sles12-sp2
console
x11
shutdown ..
++ duplicate/recall the grub test ..> enable the boot_into_snapshot # Sles12-sp1
rollback
console # for Sles12-sp1
x11 # for sles12-sp1

obviously i chose sp1, but is only an example of the migration matrix.

b) or we do imagine that? Making images, job-groups, or?

``

thx in advance.

#17 Updated by RBrownSUSE about 4 years ago

a) so this mean that the boot_into_snapshot test and the rollback should be enabled after the X11 tests of the migrated system ?

Yes, or have the 'rollback test' using a disk image of an already tested migrated system

so this could be an example of a job of the matrix:
Sles12-sp1.
installation --> after that we are on Sles12-sp2
console 
x11
shutdown ..
++ duplicate/recall the grub test ..> enable the boot_into_snapshot # Sles12-sp1
console # for Sles12-sp1
x11

Yes, but why are you thinking that through so detailed? the main.pm takes care of most of that for you, all we need is a test after the migration of a sle12sp2 machine to roll it back to sle12sp1 or ga and then run all the tests we'd normally run on sles

That's the task..make a test that rolls back a migrated system back to the version it had before it migrated...

b) or we do imagine that?

Do we imagine what?

#18 Updated by dmaiocchi about 4 years ago

This was a typo, from my side, sorry.
b) or we do imagine that? --> or how do we imagine this task?

I'm thinking detailed now, for avoid implementations errors for the future.
Like that i wrote the snapshot test thinking that i should be on the middle.

BTw, i was trying to test, or to load duplicate tests on the main.pm for a job, this is not working.

with simple test-cases.
with functions:
``
only an example:

unless (load_applicationstests() || load_slenkins_tests()) {
load_rescuecd_tests();
load_consoletests();
load_x11tests();
load_consoletests();
load_x11tests();
``
Can you double-check that? I'm not sure.

I would say that to make an image, is the better solution, and clean, but this would implicate more images to maintain, another test for each migration and so on.

Well we should take a decision on that, because if we take an image "rollbacked", then i have to code it different speaking from the logic itself , from the case that "we run all the test sequentially"

#19 Updated by RBrownSUSE about 4 years ago

only an example:
unless (load_applicationstests() || load_slenkins_tests()) {
load_rescuecd_tests();
load_consoletests();
load_x11tests();
load_consoletests();
load_x11tests();

It's never a good idea to repeat the same test modules within the same scenario. The WebUI can never handle it, so you get a very incomplete picture.

We've worked around that in the past by symlinking some tests so you effectively have two test modules with two different names but the same code. But that does not scale for this situation

But, I do not think that is a problem because:

I would say that to make an image, is the better solution, and clean, but this would implicate more images to maintain, another test for each migration and so on.

I totally agree that creating images is a better solution

Images to maintain is not a problem - openQA takes care of that. We will just have it make the image as a result of each migration test. The Gru will tidy them up.

And the typical desire of 'run as much as possible in parallel' does not apply here - you cannot rollback until migration is completed. And if the migration doesn't complete the image will not be made, so the rollback test will automatically be cancelled.

#20 Updated by dmaiocchi about 4 years ago

ok thx a lot of the feedback.
i want just summarize to get a clearly picture/testplan for me. (will be detailed, because i want to track the tasks/subtask of this task)

Admin side of openqa.suse.de:

  • After the migration-job is done, make an image that contain the migrated systems (sles12-sp1-sp2-migration.qcow2 for example)
  • Create a new job that we'll be called rollback-migration-12sp2-sp1, as example. ( this are multiples jobs)

From my side:

Testsing/code side:(sles12-sp1-sp2-migration.qcow2

  • create new vars.json file to readapt workflow of test.. (installation is not needed with a (sles12-sp1-sp2-migration.qcow2qcow)
  • create a qcow.

- readapt the main.pm to enable the run the grub/boot_into_snapshot test at the boot time of the qcow.

  • rewrite the snaphost/snapper rollback test for this scenario.
  • enable only this case the snapper_rollback_test_
  • make the console and x11 tests after.

#21 Updated by dmaiocchi about 4 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 70 to 100

#22 Updated by dmaiocchi about 4 years ago

According to latest results of production,

i see that the new needle match only in a few cases.

here for example, the name sles-sp1 cause that the string is moved to right, so the word "update" is truncated.

https://openqa.suse.de/tests/382291/modules/grub_test_snapshot/steps/48

Open for suggestions about how we can fix that.

#23 Updated by RBrownSUSE about 4 years ago

Richards Rule #1 of Needling

"Make the Needle as small as it needs to be"

Make your needle smaller, so it works when truncated, just like we talked about last week..

#24 Updated by okurz about 4 years ago

  • Description updated (diff)
  • Status changed from Feedback to In Progress
  • Assignee deleted (dmaiocchi)

"boot_to_snapshot" on x86_64 works pretty good, e.g. see https://openqa.suse.de/tests/396527
issues we see on x86_64 are product issues, e.g. in sle-12-SP2-Server-DVD-x86_64-Build1537-rollback_migration_offline_sle12@64bit https://openqa.suse.de/tests/396187 jobs failing because of incomplete gnome menu but also because we try to apply SP2 needles to the GA build, e.g. in gnome_terminal test which does not work.
ppc is currently not triggered at all because the job creating the image after migration fails, e.g. see in https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=1537&groupid=25 that the job is user_cancelled and also in many predecessors so we can not do much there right now.

#25 Updated by okurz almost 4 years ago

  • Checklist changed from [ ] SLE, [ ] TW, [ ] Leap to [x] SLE, [ ] TW, [ ] Leap
  • Priority changed from High to Low

also works on ppc64le: https://openqa.suse.de/tests/489879 albeit flaky.
considered done for SLE.

Missing to enable for TW+Leap.

#26 Updated by okurz almost 4 years ago

  • Copied to action #12964: [opensuse][functional][u] Boot to snapshot after upgrade and then rollback added

#27 Updated by okurz almost 4 years ago

  • Checklist changed from [x] SLE, [ ] TW, [ ] Leap to [x] SLE
  • Status changed from In Progress to Resolved
  • Assignee set to okurz
  • Priority changed from Low to High

openSUSE enablemement tracked in #12964

#28 Updated by okurz almost 4 years ago

  • Blocks action #13156: os-autoinst: Add support to easily switch VERSION during a test run added

Also available in: Atom PDF