action #45326
closed[sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine
0%
Description
Observation¶
openQA test in scenario sle-15-SP1-Installer-DVD-s390x-qa_userspace_apache2_mod_perl@s390x-kvm-sle12 fails in
bootloader_zkvm
Reproducible¶
Fails since (at least) Build 128.1 (current job)
Expected result¶
Last good: 126.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by oorlov about 6 years ago
Just for the statistics. There are more than one such fail.
Please, follow the link to see all the failed 'bootloader_zkvm' modules due to the same issue in 128.1 build: https://openqa.suse.de/tests/overview?arch=&failed_modules=bootloader_zkvm&distri=sle&version=15-SP1&build=128.1&groupid=110#
Updated by okurz almost 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: create_hdd_pcm_azure
https://openqa.suse.de/tests/2347236
Updated by okurz almost 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: qa_userspace_apache2_mod_perl
https://openqa.suse.de/tests/2376767
Updated by okurz almost 6 years ago
- Subject changed from [functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine to [sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine
- Status changed from New to Feedback
- Assignee set to mgriessmeier
- Priority changed from Normal to Low
- Target version changed from Milestone 25+ to Milestone 23
From duplicate ticket #46337 from mgriessmeier:
the actual issue is here (reading logs helps ;) ):
[2019-01-17T13:10:13.451 CET] [debug] Command executed: virsh define /var/lib/libvirt/images/openQA-SUT-3.xml
[2019-01-17T13:10:13.588 CET] [debug] Command's stderr:
error: Failed to define domain from /var/lib/libvirt/images/openQA-SUT-3.xml
error: cannot fork child process: Cannot allocate memory
Nick and me already took a look - libvirtd process is consuming a lot of memory, it was running for 130 days, we should track this somehow.
but for now - after restarting libvirtd, the memory usage normalized.
keeping on feedback - low, for tracking
Updated by okurz almost 6 years ago
- Has duplicate action #46337: [sle][functional][u] test fails in bootloader_zkvm - cannot allocate memory added
Updated by mgriessmeier almost 6 years ago
- Status changed from Feedback to Resolved
didn't see this anymore after the memory was increased on s390p8
Updated by mgriessmeier almost 6 years ago
- Status changed from Resolved to In Progress
- Assignee deleted (
mgriessmeier) - Priority changed from Low to Normal
happening now on our sle15 kvm machines aka s390p7 hypervisor
https://openqa.suse.de/tests/2494174/modules/bootloader_zkvm/steps/22
Updated by mgriessmeier almost 6 years ago
- Status changed from In Progress to Workable
Updated by SLindoMansilla almost 6 years ago
- Status changed from Workable to In Progress
- Assignee set to SLindoMansilla
Updated by mgriessmeier almost 6 years ago
please keep in mind, restarting libvirtd during tests is killing running machines =)
so maybe a cronjob on the host itself?
which might be also helpful is:
http://openqa-monitoring.qa.suse.de:3000/d/AFjd939ik/s390-lpar?panelId=4&fullscreen&orgId=1
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 23 to Milestone 24
moving to M24
Updated by SLindoMansilla over 5 years ago
- Target version changed from Milestone 24 to Milestone 25
Updated by SLindoMansilla over 5 years ago
- Status changed from In Progress to Feedback
Updated by SLindoMansilla over 5 years ago
- Status changed from Feedback to In Progress
Implementing the same calls for post_fail_hook
Updated by coolgw over 5 years ago
If you check following group, you will see a lot of failed on bootloaderzvm
https://openqa.suse.de/tests/overview?distri=sle&version=12-SP5&build=0198&groupid=234
I just select some failed cases:
https://openqa.suse.de/tests/2986646#step/bootloader_zkvm#1/14
https://openqa.suse.de/tests/2982142#step/bootloader_zkvm/28
https://openqa.suse.de/tests/2986656#step/bootloader_zkvm/1
https://openqa.suse.de/tests/2986643#step/bootloader_zkvm#1/1
https://openqa.suse.de/tests/2986644#step/bootloader_zkvm#1/14
Updated by SLindoMansilla over 5 years ago
There are different causes for all of those failures.
Some of them are failing for the cause targeted in this ticket, "cannot allocate memory", which I can see happening in openqaworker5. But, I cannot find any more failure, so, maybe someone already executed systemctl restart libvirtd
?
If it happens again "cannot allocate memory", someone with access to that worker has to restart the service.
Updated by coolgw over 5 years ago
Issue trigger again and i found it happen on openqaworker5 again, could we just add one script on this machine and do reboot operation once memory issue happen? So we can workround this.
openqaworker5:6
https://openqa.suse.de/tests/3018758#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018757#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018759#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018753#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018751#step/bootloader_zkvm/6
Updated by SLindoMansilla over 5 years ago
There were some complaints that this would cause running jobs to fail. But, they should be restarted, right?
I will prepare a PR for that.
Updated by SLindoMansilla over 5 years ago
PR to generate a file with stderr output of virsh command: https://github.com/os-autoinst/os-autoinst/pull/1173
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 25 to Milestone 26
Updated by mgriessmeier over 5 years ago
- Priority changed from Normal to High
happens more and more - we need to take care of this
Updated by coolgw over 5 years ago
Updated by SLindoMansilla over 5 years ago
Updated by szarate over 5 years ago
- Related to action #54248: [functional][u] test fails in boot_to_desktop: SUT is booting from the wrong medium added
Updated by okurz over 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sles12sp4_media_sdk_def_full
https://openqa.suse.de/tests/3167151
Updated by okurz over 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: online_sles12sp4_pscc_base_all_minimal_zypp
https://openqa.suse.de/tests/3251956
Updated by coolgw over 5 years ago
@ SLindoMansilla @ szarate @okurz I saw the PR already merged, then the automatic restart can work now or still some part need to be done?
Updated by SLindoMansilla over 5 years ago
- Status changed from In Progress to Workable
coolgw wrote:
@ SLindoMansilla @ szarate @okurz I saw the PR already merged, then the automatic restart can work now or still some part need to be done?
For the automatic restart, we need:
- An agreement to apply the same a approach for the same machines
- Create salt states to automatically deploy that task (bash script, cronjob, systemd timer, etc)
This will take long time. foursixnine is working on something while we get to that point. He will automatically collect worker data to show statistics and/or send notifications (including not enough memory), so that a human can react to it, before a QA reviewer take a look at the job.
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 26 to Milestone 27
Updated by SLindoMansilla over 5 years ago
- Status changed from Workable to In Progress
Updated by SLindoMansilla over 5 years ago
Updated by SLindoMansilla over 5 years ago
Preparing the package: https://build.opensuse.org/package/show/devel:openSUSE:QA:QSF/auto-restart-libvirtd
Updated by SLindoMansilla over 5 years ago
Preparing the salt state: https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/157
Updated by okurz over 5 years ago
- Has duplicate action #56165: qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memory added
Updated by SLindoMansilla over 5 years ago
- Priority changed from Urgent to High
What needs to be done?
- Find a general github project to host the source code DONE -> https://github.com/openSUSE/auto-restart-libvirtd
- Find a way to restart the affected failed jobs
Updated by SLindoMansilla over 5 years ago
Salt state was merged: https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/157
Updated readme was merged: https://github.com/openSUSE/auto-restart-libvirtd/pull/1/files
Updated by SLindoMansilla over 5 years ago
- Status changed from In Progress to Workable
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 27 to Milestone 28
Updated by okurz about 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sles12sp2_ltss_rmt_sdk_all_full_s390x
https://openqa.suse.de/tests/3373611
Updated by mgriessmeier almost 5 years ago
- Target version changed from Milestone 28 to Milestone 31
Updated by okurz almost 5 years ago
Please see https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/264 where I suggest to remove the auto-restart-libvirtd part again as I am convinced it is completely ineffective in the current form for multiple reasons. Unless the package is used elsewhere I also suggest to remove https://build.opensuse.org/project/show/devel:openSUSE:QA:QSF again to save ressources.
Updated by SLindoMansilla almost 5 years ago
Package deleted from OBS: https://build.opensuse.org/project/show/devel:openSUSE:QA:QSF
Updated by SLindoMansilla almost 5 years ago
- Status changed from Workable to Resolved
Resolved:works-for-me
No new evidence of libvirtd with memory allocation problems.