Project

General

Profile

Actions

action #9714

closed

ppc64le - btrfs disk images for migration are missing snapper

Added by RBrownSUSE over 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Bugs in existing tests
Start date:
2015-11-27
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

observation

scenarios migration_offline_sle12_ppc and migration_zdup_offline_sle are failing in step snapper_undochange because snapper can't find any snapshots, e.g. see https://openqa.suse.de/tests/462026#step/snapper_undochange/7

steps to reproduce

any run of migration_offline_sle12_ppc or migration_zdup_offline_sle fail, e.g. call clone_job.pl 462026 if you have a ppc64le worker available.

problem

The HDD images used are too small. zluo created another one but it is still too small (10G). okurz tried to create an image using openQA, which is 20G, has snapshots but is not booting to desktop now

suggestion

zluo will create another image manually. Let's see if this one works.

further details

originally reported test results - not accessible anymore:
https://openqa.suse.de/tests/162428
https://openqa.suse.de/tests/162433


Checklist

  • SLE

Related issues 3 (0 open3 closed)

Has duplicate openQA Tests - action #12252: migration_offline_sle12_allpatterns_ppc@ppc64le fails with "out of disk space error"Rejected2016-06-07

Actions
Has duplicate openQA Project - action #13074: HDD image is missing snapper config - migration_offline_sle12@ppc64le testRejected2016-08-09

Actions
Blocks openQA Tests - action #9712: ppc64le - SDK+allpatterns needs more spaceResolvedmkravec2015-11-27

Actions
Actions #1

Updated by RBrownSUSE over 8 years ago

  • Checklist item changed from to [ ] SLE
  • Target version deleted (156)
Actions #2

Updated by RBrownSUSE about 8 years ago

  • Assignee set to dmaiocchi
Actions #3

Updated by RBrownSUSE about 8 years ago

  • Target version set to Milestone 2
Actions #4

Updated by dmaiocchi about 8 years ago

Richard the links become broken. Have you new one?

By the way i tryed to run clonejob.pl for ppc64 and it wasn't working. Is there a way to simulate ppc jobs locally? or should i have a ppc machine? thx

Actions #5

Updated by RBrownSUSE about 8 years ago

dmaiocchi wrote:

Richard the links become broken. Have you new one?

any job for migration_offline_sle12_ppc or migration_offline_sle12_allpatterns_ppc on any build is a good example :)

https://openqa.suse.de/tests/280729
https://openqa.suse.de/tests/280730

By the way i tryed to run clonejob.pl for ppc64 and it wasn't working. Is there a way to simulate ppc jobs locally? or should i have a ppc machine? thx

In THEORY you should be able to get qemu to run qemu-ppc with a ppc64le CPU without needing a ppc machine...never tried it..but with a suitably creative vars.json you should be able to make isotovideo do it on an intel machine

we did do it for aarch64, with the aarch64-emu machine, with the following vars:

BIOS=qemu-uefi-aarch64.bin
CDMODEL=virtio-blk-device
HDDMODEL=virtio-blk-device
QEMU=aarch64
QEMUCPU=cortex-a57
QEMUMACHINE=virt
QEMU_NO_KVM=1
TIMEOUT_SCALE=4
WORKER_CLASS=qemu_x86_64

Something similar for power should work, IN THEORY

Actions #6

Updated by dmaiocchi about 8 years ago

ok i will try this out

Actions #7

Updated by dmaiocchi about 8 years ago

ok isn't that fixed richard? i see yast_snapper test.

what do you mean by missing snapper? missing a snapper test, or missing the package itself?

Actions #8

Updated by dmaiocchi about 8 years ago

ok i got, i cannot find any image in openqa.suse.de that match GA.

is it ok if i use this image?

SLE-12-Server-ppc64le-GM-allpatterns.qcow2
Actions #9

Updated by dmaiocchi about 8 years ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 50

As we discussed, here is the vars.json that should work. I didn't tested because i don't have a bare-powermachine ( :( )

We can take the goldmaster for sle-12 and make the installation and the qcow, so we can install it.

git clone https://github.com/MalloZup/generate_image.git

If i had in orthos bare powerpc machines, the thing would be more easy, and we could schedule i simply installation job, that we could reuse for this in future.

Actions #10

Updated by dmaiocchi about 8 years ago

  • Assignee deleted (dmaiocchi)
  • % Done changed from 50 to 0
Actions #11

Updated by okurz almost 8 years ago

  • Related to action #9712: ppc64le - SDK+allpatterns needs more space added
Actions #12

Updated by okurz almost 8 years ago

  • Status changed from Feedback to In Progress
  • Priority changed from High to Urgent

most recent examples of fails: https://openqa.suse.de/tests/408684 and https://openqa.suse.de/tests/408686

how can new images be created and how can they be put on openqa for assets? See #9714#note-5 for ideas how to run ppc64le on local machine but this does not answer the second part of the question. Maybe openqa can simply do the installation for us with an "on-demand" test_suite? E.g. https://openqa.suse.de/tests/408642 is the test_suite sles12_gnome_create_hdd@ppc64le and already creating disk images for SP2 so can we simply adapt this for GM?

Actions #13

Updated by mgriessmeier almost 8 years ago

  • Assignee set to zluo

@zluo: Can you maybe create the requested image, as a ppc64 expert?
Í can then help you with uploading it to o.s.d.

Actions #14

Updated by zluo almost 8 years ago

will provide disk image to you...

Actions #16

Updated by okurz almost 8 years ago

  • Assignee changed from zluo to okurz

checking image by zluo, adapt name if necessary, upload, register, test.

I think I should be able to test this by uploading the image file (whatever the name) and manually trigger a test job with explicit name as one-shot overriding the variables as necessary, e.g.

as geekotest@openqa

cd /var/lib/openqa/factory/hdd
wget http://10.162.2.8/sles12.gm.all-patterns.brtfs.snapper.qcow2 -O SLE-12-Server-ppc64le-GM-gnome_with_snapper.qcow2
cd /var/lib/openqa/factory/iso
/usr/share/openqa/script/client isos post --params SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.6.json HDD_1=SLE-12-Server-ppc64le-GM-gnome_with_snapper.qcow2 TEST=migration_offline_sle12_ppc BUILD=1651a

why SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.6.json? I checked SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.?.json: There are …5… and …6…. …5… is for HA so I chose 6.

Scheduled as https://openqa.suse.de/tests/463859

Job can be cleaned afterwards with

client jobs/463859 delete
Actions #17

Updated by okurz almost 8 years ago

zluo, the image you provided is only 10GB and fails when installing packages during upgrade https://openqa.suse.de/tests/463859#step/install_and_reboot/4, probably because out of disk space.

I am more or less wandering blindly here but called the following to create a SLE12SP0 installation. maybe this is creating what we want:

openqa_client_osd jobs post DISTRI=sle VERSION=12 FLAVOR=Server-DVD ARCH=ppc64le BUILD=okurz_poo9714 \
ISO=SLE-12-Server-DVD-ppc64le-GM-DVD1.iso INSTALLONLY=1 QEMU_COMPRESS_QCOW2=1 \
PUBLISH_HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2 TEST=create_gm_ppc_image \
MACHINE=ppc64le WORKER_CLASS=qemu_ppc64le HDDSIZEGB=20 MAX_JOB_TIME=86400 TIMEOUT_SCALE=10

--> https://openqa.suse.de/tests/463876

Actions #18

Updated by okurz almost 8 years ago

Failed with "cirrus" as VGA not available. Triggered with

openqa_client_osd jobs post DISTRI=sle VERSION=12 FLAVOR=Server-DVD ARCH=ppc64le BACKEND=qemu \
NOVIDEO=1 OFW=1 QEMUCPU=host SERIALDEV=hvc0 BUILD=okurz_poo9714 \
ISO=SLE-12-Server-DVD-ppc64le-GM-DVD1.iso INSTALLONLY=1 QEMU_COMPRESS_QCOW2=1 \
PUBLISH_HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2 TEST=create_gm_ppc_image \
MACHINE=ppc64le WORKER_CLASS=qemu_ppc64le HDDSIZEGB=20 MAX_JOB_TIME=86400 TIMEOUT_SCALE=10

--> t#463883

EDIT: Completed now. Let's see what happens if I clone sle-12-SP2-Server-DVD-ppc64le-Build1649-migration_offline_sle12_ppc@ppc64le but with the new HDD_1

openqa_clone_job_osd 462022 HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2

--> t#463890

Let's see if it can find snapshots or if the harddisk of the SP0 image is still too small or snapshots won't be enabled by default at all.

EDIT: t#463890 failed with timeout on first boot. I suspect either a sporadic performance problem or network configuration problem. retriggered with TIMEOUT_SCALE

openqa_clone_job_osd 463890 TIMEOUT_SCALE=10

--> t#463947

EDIT: cleaned temporary builds

$ openqa_client_osd jobs/463859 delete
{ result => 1 }
$ openqa_client_osd jobs/463724 delete
{ result => 1 }

Result of t#463947 is still that it can't boot into a desktop session although switching to terminal and logging in works ok. I checked logs but couldn't find anything obvious. Anyone else has an idea?

Actions #19

Updated by okurz almost 8 years ago

[08/07/2016 11:21:03] <okurz> [08/07/2016 11:06:13] <okurz> How wants to take a look at https://openqa.suse.de/tests/464628#live (30 minutes left) and tell me how I can further debug the bootup process? It's stuck there
[08/07/2016 11:32:17] <okurz> ok, fine. any further ideas how to follow on? I would think switching to text console and evaluate why display manager does not show up. the logs show that gdm is running at least
[08/07/2016 12:49:09] <okurz> at least we know now better that "vncviewer" is easier to use to also switch terminals and such
[08/07/2016 13:03:34] <okurz> that is what my test did! It's just not booting but I actually checked in 464665 that there are snapshots now. I will reconduct it and try to record a video
openqa_clone_job_osd 464665 NOVIDEO=0

then connected to instance when it stalled with

vncviewer -Shared malbec.arch:91

changed to text console (press F8 in vncviewer, select ctrl and alt in menue, exit menue, press F2), login as root and called some commands for debugging. As it looks, at least there are snapshots now as visible in the video
https://openqa.suse.de/tests/464666/file/video.ogv at 1:47.

So, what now?

Actions #20

Updated by okurz almost 8 years ago

  • Description updated (diff)

updated description with current state as of 2016-07-08

Actions #23

Updated by okurz almost 8 years ago

Make sure not to loose what we already observed with the failing test cases to support Feature 317784: Implement a check and warning to make sure we don't start a Service Pack migration with insufficient disk space

Actions #24

Updated by okurz almost 8 years ago

Investigated https://openqa.suse.de/tests/464665/file/first_boot-journal.log with the help of rbrown. What I see is

Jul 08 06:15:01 susetest systemd[1]: Started User Manager for UID 0.
Jul 08 06:15:24 susetest dbus[1455]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
Jul 08 06:15:24 susetest systemd[1]: Starting Cleanup of Temporary Directories...
Jul 08 06:15:24 susetest dbus[1455]: [system] Successfully activated service 'org.opensuse.Snapper'
Jul 08 06:15:24 susetest systemd[1]: Started Cleanup of Temporary Directories.
Jul 08 06:16:05 susetest kernel: BTRFS info (device vda3): relocating block group 9156165632 flags 1
Jul 08 06:16:05 susetest kernel: BTRFS info (device vda3): found 1960 extents
Jul 08 06:16:06 susetest kernel: BTRFS info (device vda3): found 1958 extents
Jul 08 06:16:06 susetest kernel: BTRFS info (device vda3): relocating block group 10364125184 flags 1
…

Our idea is that there is some btrfs cleanup pending which is taking so long that systemd is terminating the wait and preventing a proper X startup? I will try to either boot into single mode first and see if it settles down before first boot or give the vanilla GA session more time before saving the image (after some btrfs cleanup grace period)

Actions #25

Updated by okurz almost 8 years ago

  • Assignee deleted (okurz)

won't be able to do this in time. If someone with easier access to power can work on this, this would be great.

Actions #26

Updated by okurz over 7 years ago

  • Has duplicate action #12252: migration_offline_sle12_allpatterns_ppc@ppc64le fails with "out of disk space error" added
Actions #27

Updated by okurz over 7 years ago

  • Related to deleted (action #9712: ppc64le - SDK+allpatterns needs more space)
Actions #28

Updated by okurz over 7 years ago

  • Blocks action #9712: ppc64le - SDK+allpatterns needs more space added
Actions #29

Updated by mkravec over 7 years ago

  • Assignee set to mkravec
Actions #30

Updated by mkravec over 7 years ago

I scheduled poo_9714 test in osd:
medium: sle-12-Server-DVD-ppc64le
group: Test Development: SLE 12 SP2
parameters:
DESKTOP=gnome
HDDSIZEGB=20
INSTALLONLY=1
PUBLISH_HDD_1=poo_9714.qcow2

Actions #31

Updated by okurz over 7 years ago

  • Has duplicate action #13074: HDD image is missing snapper config - migration_offline_sle12@ppc64le test added
Actions #32

Updated by okurz over 7 years ago

hi mkravec, did you read the previous comments? What did you change so that your installation could yield a usable system?

Actions #33

Updated by mkravec over 7 years ago

I read them - nothing changed, I just have to start somewhere..

Actions #34

Updated by mkravec over 7 years ago

I reproduced https://openqa.suse.de/tests/510844#step/first_boot/3 on ppc VM. Does not look like openqa issue.
I compared working 10GB image we currently have with the one I produced, packages and autoinst.xml are the same.

https://bugzilla.suse.com/show_bug.cgi?id=993253

Actions #35

Updated by mkravec over 7 years ago

SLES-12 needs to be patched before upgrade to SP2, otherwise there are issues mentioned above.

Actions #36

Updated by mkravec over 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

How to build images manually:

  • Create VM with 20G disk
  • Install SLES-12 & register & update during installation
    • For 40G image (allpatterns) do not create separate home xfs directory
  • Boot into SLES 12
  • Unregister (SUSEConnect -d)
  • Remove /etc/udev/rules.d/70-persistent-net.rules
  • Power off
  • Shring image (qemu-img convert -O qcow2 sles12.qcow2 sles12-small.qcow2)

https://openqa.suse.de/tests/515973
https://openqa.suse.de/tests/517340

Old image backed up into sle-12-Server-DVD-ppc64le-gnome-10g-unused.qcow2. I will delete it next week if everything is fine.

Actions #37

Updated by mkravec over 7 years ago

  • Status changed from Resolved to Feedback

20G images are also needed for:

  • migration_offline_sle12_allpatterns@ppc64le
  • migration_offline_sle12sp1_allpatterns@ppc64le

Should I wait until "Low disk space" bsc#987813 is resolved?

Actions #38

Updated by okurz over 7 years ago

wow, you achieved what I failed to do.

Thank for providing instructions.

Just for the record: How did you run the virtual machine? qemu-ppc64le locally or on a ppc64le worker?

Regarding the "low disk space" warning: There is https://progress.opensuse.org/issues/11438 to implement an actual test for this. For now we have the test implicitly because they fail because of out of diskspace. What about just preserving the "too small" images for test_suites which have that in mind, e.g.

  • use the new, bigger drive for test_suite migration_offline_sle12_allpatterns
  • create a new test_suite migration_offline_sle12_allpatterns_too_low_space which uses the old, too small disk

does this make sense?

Actions #39

Updated by RBrownSUSE over 7 years ago

makes sense to me..can we do it?

great work Martin!

Actions #40

Updated by mkravec over 7 years ago

  • Status changed from Feedback to In Progress
  • % Done changed from 100 to 70

I used VM on ppc server (borrowed from dgutu). And yes, it makes sense Oliver, I will create additional test.

Actions #41

Updated by mkravec over 7 years ago

  • % Done changed from 70 to 90

Images for allpatterns created:
SP0: https://openqa.suse.de/tests/522167
SP1: https://openqa.suse.de/tests/522159

Original images replaced by bigger ones (comment#36), and backed up:
sle-12-Server-DVD-ppc64le-allpatterns.qcow2 -> sle-12-Server-DVD-ppc64le-allpatterns-small.qcow2 (used for low_space test as proposed by Oliver)
sle-12-SP1-Server-DVD-ppc64le-allpatterns.qcow2 -> sle-12-SP1-Server-DVD-ppc64le-allpatterns-unused.qcow2

Problem of old allpattern images was separate /home partition taking 20G, I removed it and 40G disk space is now enough.

Actions #42

Updated by mkravec over 7 years ago

  • % Done changed from 90 to 100
Actions #43

Updated by mkravec over 7 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF