action #9714
closedppc64le - btrfs disk images for migration are missing snapper
Added by RBrownSUSE about 9 years ago. Updated over 8 years ago.
100%
Description
observation¶
scenarios migration_offline_sle12_ppc and migration_zdup_offline_sle are failing in step snapper_undochange because snapper can't find any snapshots, e.g. see https://openqa.suse.de/tests/462026#step/snapper_undochange/7
steps to reproduce¶
any run of migration_offline_sle12_ppc or migration_zdup_offline_sle fail, e.g. call clone_job.pl 462026
if you have a ppc64le worker available.
problem¶
The HDD images used are too small. zluo created another one but it is still too small (10G). okurz tried to create an image using openQA, which is 20G, has snapshots but is not booting to desktop now
suggestion¶
zluo will create another image manually. Let's see if this one works.
further details¶
originally reported test results - not accessible anymore:
https://openqa.suse.de/tests/162428
https://openqa.suse.de/tests/162433
Updated by RBrownSUSE almost 9 years ago
- Checklist item changed from to [ ] SLE
- Target version deleted (
156)
Updated by dmaiocchi almost 9 years ago
Richard the links become broken. Have you new one?
By the way i tryed to run clonejob.pl for ppc64 and it wasn't working. Is there a way to simulate ppc jobs locally? or should i have a ppc machine? thx
Updated by RBrownSUSE almost 9 years ago
dmaiocchi wrote:
Richard the links become broken. Have you new one?
any job for migration_offline_sle12_ppc or migration_offline_sle12_allpatterns_ppc on any build is a good example :)
https://openqa.suse.de/tests/280729
https://openqa.suse.de/tests/280730
By the way i tryed to run clonejob.pl for ppc64 and it wasn't working. Is there a way to simulate ppc jobs locally? or should i have a ppc machine? thx
In THEORY you should be able to get qemu to run qemu-ppc with a ppc64le CPU without needing a ppc machine...never tried it..but with a suitably creative vars.json you should be able to make isotovideo do it on an intel machine
we did do it for aarch64, with the aarch64-emu machine, with the following vars:
BIOS=qemu-uefi-aarch64.bin
CDMODEL=virtio-blk-device
HDDMODEL=virtio-blk-device
QEMU=aarch64
QEMUCPU=cortex-a57
QEMUMACHINE=virt
QEMU_NO_KVM=1
TIMEOUT_SCALE=4
WORKER_CLASS=qemu_x86_64
Something similar for power should work, IN THEORY
Updated by dmaiocchi almost 9 years ago
ok isn't that fixed richard? i see yast_snapper test.
what do you mean by missing snapper? missing a snapper test, or missing the package itself?
Updated by dmaiocchi almost 9 years ago
ok i got, i cannot find any image in openqa.suse.de that match GA.
is it ok if i use this image?
SLE-12-Server-ppc64le-GM-allpatterns.qcow2
Updated by dmaiocchi almost 9 years ago
- Status changed from New to Feedback
- % Done changed from 0 to 50
As we discussed, here is the vars.json that should work. I didn't tested because i don't have a bare-powermachine ( :( )
We can take the goldmaster for sle-12 and make the installation and the qcow, so we can install it.
git clone https://github.com/MalloZup/generate_image.git
If i had in orthos bare powerpc machines, the thing would be more easy, and we could schedule i simply installation job, that we could reuse for this in future.
Updated by dmaiocchi over 8 years ago
- Assignee deleted (
dmaiocchi) - % Done changed from 50 to 0
Updated by okurz over 8 years ago
- Related to action #9712: ppc64le - SDK+allpatterns needs more space added
Updated by okurz over 8 years ago
- Status changed from Feedback to In Progress
- Priority changed from High to Urgent
most recent examples of fails: https://openqa.suse.de/tests/408684 and https://openqa.suse.de/tests/408686
how can new images be created and how can they be put on openqa for assets? See #9714#note-5 for ideas how to run ppc64le on local machine but this does not answer the second part of the question. Maybe openqa can simply do the installation for us with an "on-demand" test_suite? E.g. https://openqa.suse.de/tests/408642 is the test_suite sles12_gnome_create_hdd@ppc64le and already creating disk images for SP2 so can we simply adapt this for GM?
Updated by mgriessmeier over 8 years ago
- Assignee set to zluo
@zluo: Can you maybe create the requested image, as a ppc64 expert?
Í can then help you with uploading it to o.s.d.
Updated by zluo over 8 years ago
Updated by okurz over 8 years ago
- Assignee changed from zluo to okurz
checking image by zluo, adapt name if necessary, upload, register, test.
I think I should be able to test this by uploading the image file (whatever the name) and manually trigger a test job with explicit name as one-shot overriding the variables as necessary, e.g.
as geekotest@openqa
cd /var/lib/openqa/factory/hdd
wget http://10.162.2.8/sles12.gm.all-patterns.brtfs.snapper.qcow2 -O SLE-12-Server-ppc64le-GM-gnome_with_snapper.qcow2
cd /var/lib/openqa/factory/iso
/usr/share/openqa/script/client isos post --params SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.6.json HDD_1=SLE-12-Server-ppc64le-GM-gnome_with_snapper.qcow2 TEST=migration_offline_sle12_ppc BUILD=1651a
why SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.6.json
? I checked SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.?.json
: There are …5…
and …6…
. …5…
is for HA so I chose 6.
Scheduled as https://openqa.suse.de/tests/463859
Job can be cleaned afterwards with
client jobs/463859 delete
Updated by okurz over 8 years ago
zluo, the image you provided is only 10GB and fails when installing packages during upgrade https://openqa.suse.de/tests/463859#step/install_and_reboot/4, probably because out of disk space.
I am more or less wandering blindly here but called the following to create a SLE12SP0 installation. maybe this is creating what we want:
openqa_client_osd jobs post DISTRI=sle VERSION=12 FLAVOR=Server-DVD ARCH=ppc64le BUILD=okurz_poo9714 \
ISO=SLE-12-Server-DVD-ppc64le-GM-DVD1.iso INSTALLONLY=1 QEMU_COMPRESS_QCOW2=1 \
PUBLISH_HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2 TEST=create_gm_ppc_image \
MACHINE=ppc64le WORKER_CLASS=qemu_ppc64le HDDSIZEGB=20 MAX_JOB_TIME=86400 TIMEOUT_SCALE=10
Updated by okurz over 8 years ago
Failed with "cirrus" as VGA not available. Triggered with
openqa_client_osd jobs post DISTRI=sle VERSION=12 FLAVOR=Server-DVD ARCH=ppc64le BACKEND=qemu \
NOVIDEO=1 OFW=1 QEMUCPU=host SERIALDEV=hvc0 BUILD=okurz_poo9714 \
ISO=SLE-12-Server-DVD-ppc64le-GM-DVD1.iso INSTALLONLY=1 QEMU_COMPRESS_QCOW2=1 \
PUBLISH_HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2 TEST=create_gm_ppc_image \
MACHINE=ppc64le WORKER_CLASS=qemu_ppc64le HDDSIZEGB=20 MAX_JOB_TIME=86400 TIMEOUT_SCALE=10
--> t#463883
EDIT: Completed now. Let's see what happens if I clone sle-12-SP2-Server-DVD-ppc64le-Build1649-migration_offline_sle12_ppc@ppc64le but with the new HDD_1
openqa_clone_job_osd 462022 HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2
--> t#463890
Let's see if it can find snapshots or if the harddisk of the SP0 image is still too small or snapshots won't be enabled by default at all.
EDIT: t#463890 failed with timeout on first boot. I suspect either a sporadic performance problem or network configuration problem. retriggered with TIMEOUT_SCALE
openqa_clone_job_osd 463890 TIMEOUT_SCALE=10
--> t#463947
EDIT: cleaned temporary builds
$ openqa_client_osd jobs/463859 delete
{ result => 1 }
$ openqa_client_osd jobs/463724 delete
{ result => 1 }
Result of t#463947 is still that it can't boot into a desktop session although switching to terminal and logging in works ok. I checked logs but couldn't find anything obvious. Anyone else has an idea?
Updated by okurz over 8 years ago
[08/07/2016 11:21:03] <okurz> [08/07/2016 11:06:13] <okurz> How wants to take a look at https://openqa.suse.de/tests/464628#live (30 minutes left) and tell me how I can further debug the bootup process? It's stuck there
[08/07/2016 11:32:17] <okurz> ok, fine. any further ideas how to follow on? I would think switching to text console and evaluate why display manager does not show up. the logs show that gdm is running at least
[08/07/2016 12:49:09] <okurz> at least we know now better that "vncviewer" is easier to use to also switch terminals and such
[08/07/2016 13:03:34] <okurz> that is what my test did! It's just not booting but I actually checked in 464665 that there are snapshots now. I will reconduct it and try to record a video
openqa_clone_job_osd 464665 NOVIDEO=0
then connected to instance when it stalled with
vncviewer -Shared malbec.arch:91
changed to text console (press F8 in vncviewer, select ctrl and alt in menue, exit menue, press F2), login as root and called some commands for debugging. As it looks, at least there are snapshots now as visible in the video
https://openqa.suse.de/tests/464666/file/video.ogv at 1:47.
So, what now?
Updated by okurz over 8 years ago
- Description updated (diff)
updated description with current state as of 2016-07-08
Updated by okurz over 8 years ago
used my experiences from here to write https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Tips-for-test-development-and-issue-investigation
Updated by okurz over 8 years ago
Make sure not to loose what we already observed with the failing test cases to support Feature 317784: Implement a check and warning to make sure we don't start a Service Pack migration with insufficient disk space
Updated by okurz over 8 years ago
Investigated https://openqa.suse.de/tests/464665/file/first_boot-journal.log with the help of rbrown. What I see is
Jul 08 06:15:01 susetest systemd[1]: Started User Manager for UID 0.
Jul 08 06:15:24 susetest dbus[1455]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
Jul 08 06:15:24 susetest systemd[1]: Starting Cleanup of Temporary Directories...
Jul 08 06:15:24 susetest dbus[1455]: [system] Successfully activated service 'org.opensuse.Snapper'
Jul 08 06:15:24 susetest systemd[1]: Started Cleanup of Temporary Directories.
Jul 08 06:16:05 susetest kernel: BTRFS info (device vda3): relocating block group 9156165632 flags 1
Jul 08 06:16:05 susetest kernel: BTRFS info (device vda3): found 1960 extents
Jul 08 06:16:06 susetest kernel: BTRFS info (device vda3): found 1958 extents
Jul 08 06:16:06 susetest kernel: BTRFS info (device vda3): relocating block group 10364125184 flags 1
…
Our idea is that there is some btrfs cleanup pending which is taking so long that systemd is terminating the wait and preventing a proper X startup? I will try to either boot into single mode first and see if it settles down before first boot or give the vanilla GA session more time before saving the image (after some btrfs cleanup grace period)
Updated by okurz over 8 years ago
- Assignee deleted (
okurz)
won't be able to do this in time. If someone with easier access to power can work on this, this would be great.
Updated by okurz over 8 years ago
- Has duplicate action #12252: migration_offline_sle12_allpatterns_ppc@ppc64le fails with "out of disk space error" added
Updated by okurz over 8 years ago
- Related to deleted (action #9712: ppc64le - SDK+allpatterns needs more space)
Updated by okurz over 8 years ago
- Blocks action #9712: ppc64le - SDK+allpatterns needs more space added
Updated by mkravec over 8 years ago
I scheduled poo_9714 test in osd:
medium: sle-12-Server-DVD-ppc64le
group: Test Development: SLE 12 SP2
parameters:
DESKTOP=gnome
HDDSIZEGB=20
INSTALLONLY=1
PUBLISH_HDD_1=poo_9714.qcow2
Updated by okurz over 8 years ago
- Has duplicate action #13074: HDD image is missing snapper config - migration_offline_sle12@ppc64le test added
Updated by okurz over 8 years ago
hi mkravec, did you read the previous comments? What did you change so that your installation could yield a usable system?
Updated by mkravec over 8 years ago
I read them - nothing changed, I just have to start somewhere..
Updated by mkravec over 8 years ago
I reproduced https://openqa.suse.de/tests/510844#step/first_boot/3 on ppc VM. Does not look like openqa issue.
I compared working 10GB image we currently have with the one I produced, packages and autoinst.xml are the same.
Updated by mkravec over 8 years ago
SLES-12 needs to be patched before upgrade to SP2, otherwise there are issues mentioned above.
Updated by mkravec over 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
How to build images manually:
- Create VM with 20G disk
- Install SLES-12 & register & update during installation
- For 40G image (allpatterns) do not create separate home xfs directory
- Boot into SLES 12
- Unregister (SUSEConnect -d)
- Remove /etc/udev/rules.d/70-persistent-net.rules
- Power off
- Shring image (qemu-img convert -O qcow2 sles12.qcow2 sles12-small.qcow2)
https://openqa.suse.de/tests/515973
https://openqa.suse.de/tests/517340
Old image backed up into sle-12-Server-DVD-ppc64le-gnome-10g-unused.qcow2. I will delete it next week if everything is fine.
Updated by mkravec over 8 years ago
- Status changed from Resolved to Feedback
20G images are also needed for:
- migration_offline_sle12_allpatterns@ppc64le
- migration_offline_sle12sp1_allpatterns@ppc64le
Should I wait until "Low disk space" bsc#987813 is resolved?
Updated by okurz over 8 years ago
wow, you achieved what I failed to do.
Thank for providing instructions.
Just for the record: How did you run the virtual machine? qemu-ppc64le locally or on a ppc64le worker?
Regarding the "low disk space" warning: There is https://progress.opensuse.org/issues/11438 to implement an actual test for this. For now we have the test implicitly because they fail because of out of diskspace. What about just preserving the "too small" images for test_suites which have that in mind, e.g.
- use the new, bigger drive for test_suite
migration_offline_sle12_allpatterns
- create a new test_suite
migration_offline_sle12_allpatterns_too_low_space
which uses the old, too small disk
does this make sense?
Updated by RBrownSUSE over 8 years ago
makes sense to me..can we do it?
great work Martin!
Updated by mkravec over 8 years ago
- Status changed from Feedback to In Progress
- % Done changed from 100 to 70
I used VM on ppc server (borrowed from dgutu). And yes, it makes sense Oliver, I will create additional test.
Updated by mkravec over 8 years ago
- % Done changed from 70 to 90
Images for allpatterns created:
SP0: https://openqa.suse.de/tests/522167
SP1: https://openqa.suse.de/tests/522159
Original images replaced by bigger ones (comment#36), and backed up:
sle-12-Server-DVD-ppc64le-allpatterns.qcow2 -> sle-12-Server-DVD-ppc64le-allpatterns-small.qcow2 (used for low_space test as proposed by Oliver)
sle-12-SP1-Server-DVD-ppc64le-allpatterns.qcow2 -> sle-12-SP1-Server-DVD-ppc64le-allpatterns-unused.qcow2
Problem of old allpattern images was separate /home partition taking 20G, I removed it and 40G disk space is now enough.
Updated by mkravec over 8 years ago
- Status changed from In Progress to Resolved