Project

General

Profile

action #45326

[sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine

Added by okurz almost 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 31
Start date:
2018-12-19
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP1-Installer-DVD-s390x-qa_userspace_apache2_mod_perl@s390x-kvm-sle12 fails in
bootloader_zkvm

Reproducible

Fails since (at least) Build 128.1 (current job)

Expected result

Last good: 126.1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #54248: [functional][u] test fails in boot_to_desktop: SUT is booting from the wrong mediumResolved2019-07-15

Has duplicate openQA Tests - action #46337: [sle][functional][u] test fails in bootloader_zkvm - cannot allocate memoryRejected2019-01-17

Has duplicate openQA Infrastructure - action #56165: qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memoryRejected2019-08-30

History

#1 Updated by oorlov almost 3 years ago

Just for the statistics. There are more than one such fail.

Please, follow the link to see all the failed 'bootloader_zkvm' modules due to the same issue in 128.1 build: https://openqa.suse.de/tests/overview?arch=&failed_modules=bootloader_zkvm&distri=sle&version=15-SP1&build=128.1&groupid=110#

#2 Updated by okurz almost 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: create_hdd_pcm_azure
https://openqa.suse.de/tests/2347236

#3 Updated by okurz almost 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: qa_userspace_apache2_mod_perl
https://openqa.suse.de/tests/2376767

#4 Updated by okurz over 2 years ago

  • Subject changed from [functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine to [sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine
  • Status changed from New to Feedback
  • Assignee set to mgriessmeier
  • Priority changed from Normal to Low
  • Target version changed from Milestone 25+ to Milestone 23

From duplicate ticket #46337 from mgriessmeier:

the actual issue is here (reading logs helps ;) ):

[2019-01-17T13:10:13.451 CET] [debug] Command executed: virsh  define /var/lib/libvirt/images/openQA-SUT-3.xml
[2019-01-17T13:10:13.588 CET] [debug] Command's stderr:
error: Failed to define domain from /var/lib/libvirt/images/openQA-SUT-3.xml
error: cannot fork child process: Cannot allocate memory

Nick and me already took a look - libvirtd process is consuming a lot of memory, it was running for 130 days, we should track this somehow.
but for now - after restarting libvirtd, the memory usage normalized.

keeping on feedback - low, for tracking

#5 Updated by okurz over 2 years ago

  • Has duplicate action #46337: [sle][functional][u] test fails in bootloader_zkvm - cannot allocate memory added

#6 Updated by mgriessmeier over 2 years ago

  • Status changed from Feedback to Resolved

didn't see this anymore after the memory was increased on s390p8

#7 Updated by mgriessmeier over 2 years ago

  • Status changed from Resolved to In Progress
  • Assignee deleted (mgriessmeier)
  • Priority changed from Low to Normal

happening now on our sle15 kvm machines aka s390p7 hypervisor
https://openqa.suse.de/tests/2494174/modules/bootloader_zkvm/steps/22

#8 Updated by mgriessmeier over 2 years ago

  • Status changed from In Progress to Workable

#9 Updated by SLindoMansilla over 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to SLindoMansilla

#10 Updated by mgriessmeier over 2 years ago

please keep in mind, restarting libvirtd during tests is killing running machines =)
so maybe a cronjob on the host itself?

which might be also helpful is:
http://openqa-monitoring.qa.suse.de:3000/d/AFjd939ik/s390-lpar?panelId=4&fullscreen&orgId=1

#11 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 23 to Milestone 24

moving to M24

#12 Updated by SLindoMansilla over 2 years ago

  • Target version changed from Milestone 24 to Milestone 25

#13 Updated by SLindoMansilla over 2 years ago

  • Status changed from In Progress to Feedback

#14 Updated by SLindoMansilla over 2 years ago

  • Status changed from Feedback to In Progress

Implementing the same calls for post_fail_hook

#16 Updated by SLindoMansilla over 2 years ago

There are different causes for all of those failures.

Some of them are failing for the cause targeted in this ticket, "cannot allocate memory", which I can see happening in openqaworker5. But, I cannot find any more failure, so, maybe someone already executed systemctl restart libvirtd?

If it happens again "cannot allocate memory", someone with access to that worker has to restart the service.

#17 Updated by coolgw over 2 years ago

Issue trigger again and i found it happen on openqaworker5 again, could we just add one script on this machine and do reboot operation once memory issue happen? So we can workround this.
openqaworker5:6

https://openqa.suse.de/tests/3018758#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018757#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018759#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018753#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018751#step/bootloader_zkvm/6

#18 Updated by SLindoMansilla over 2 years ago

There were some complaints that this would cause running jobs to fail. But, they should be restarted, right?
I will prepare a PR for that.

#19 Updated by SLindoMansilla over 2 years ago

PR to generate a file with stderr output of virsh command: https://github.com/os-autoinst/os-autoinst/pull/1173

#20 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 25 to Milestone 26

#21 Updated by mgriessmeier over 2 years ago

  • Priority changed from Normal to High

happens more and more - we need to take care of this

#24 Updated by mgriessmeier over 2 years ago

  • Priority changed from High to Urgent

#26 Updated by szarate about 2 years ago

  • Related to action #54248: [functional][u] test fails in boot_to_desktop: SUT is booting from the wrong medium added

#27 Updated by okurz about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp4_media_sdk_def_full
https://openqa.suse.de/tests/3167151

#28 Updated by okurz about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: online_sles12sp4_pscc_base_all_minimal_zypp
https://openqa.suse.de/tests/3251956

#29 Updated by coolgw about 2 years ago

@ SLindoMansilla @ szarate okurz I saw the PR already merged, then the automatic restart can work now or still some part need to be done?

#30 Updated by SLindoMansilla about 2 years ago

  • Status changed from In Progress to Workable

coolgw wrote:

@ SLindoMansilla @ szarate okurz I saw the PR already merged, then the automatic restart can work now or still some part need to be done?

For the automatic restart, we need:

  1. An agreement to apply the same a approach for the same machines
  2. Create salt states to automatically deploy that task (bash script, cronjob, systemd timer, etc)

This will take long time. foursixnine is working on something while we get to that point. He will automatically collect worker data to show statistics and/or send notifications (including not enough memory), so that a human can react to it, before a QA reviewer take a look at the job.

#31 Updated by mgriessmeier about 2 years ago

  • Target version changed from Milestone 26 to Milestone 27

#32 Updated by SLindoMansilla about 2 years ago

  • Status changed from Workable to In Progress

#36 Updated by okurz about 2 years ago

  • Has duplicate action #56165: qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memory added

#37 Updated by SLindoMansilla about 2 years ago

  • Priority changed from Urgent to High

What needs to be done?

#39 Updated by SLindoMansilla about 2 years ago

  • Status changed from In Progress to Workable

#40 Updated by mgriessmeier about 2 years ago

  • Target version changed from Milestone 27 to Milestone 28

#41 Updated by okurz about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp2_ltss_rmt_sdk_all_full_s390x
https://openqa.suse.de/tests/3373611

#42 Updated by mgriessmeier almost 2 years ago

  • Target version changed from Milestone 28 to Milestone 31

#43 Updated by okurz over 1 year ago

Please see https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/264 where I suggest to remove the auto-restart-libvirtd part again as I am convinced it is completely ineffective in the current form for multiple reasons. Unless the package is used elsewhere I also suggest to remove https://build.opensuse.org/project/show/devel:openSUSE:QA:QSF again to save ressources.

#45 Updated by SLindoMansilla over 1 year ago

  • Status changed from Workable to Resolved

Resolved:works-for-me
No new evidence of libvirtd with memory allocation problems.

Also available in: Atom PDF