Project

General

Profile

Actions

action #45326

closed

[sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine

Added by okurz about 6 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
High
Category:
Bugs in existing tests
Target version:
SUSE QA (private) - Milestone 31
Start date:
2018-12-19
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP1-Installer-DVD-s390x-qa_userspace_apache2_mod_perl@s390x-kvm-sle12 fails in
bootloader_zkvm

Reproducible

Fails since (at least) Build 128.1 (current job)

Expected result

Last good: 126.1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 3 (0 open3 closed)

Related to openQA Tests (public) - action #54248: [functional][u] test fails in boot_to_desktop: SUT is booting from the wrong mediumResolvedszarate2019-07-15

Actions
Has duplicate openQA Tests (public) - action #46337: [sle][functional][u] test fails in bootloader_zkvm - cannot allocate memoryRejectedmgriessmeier2019-01-17

Actions
Has duplicate openQA Infrastructure (public) - action #56165: qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memoryRejectedokurz2019-08-30

Actions
Actions #1

Updated by oorlov about 6 years ago

Just for the statistics. There are more than one such fail.

Please, follow the link to see all the failed 'bootloader_zkvm' modules due to the same issue in 128.1 build: https://openqa.suse.de/tests/overview?arch=&failed_modules=bootloader_zkvm&distri=sle&version=15-SP1&build=128.1&groupid=110#

Actions #2

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: create_hdd_pcm_azure
https://openqa.suse.de/tests/2347236

Actions #3

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: qa_userspace_apache2_mod_perl
https://openqa.suse.de/tests/2376767

Actions #4

Updated by okurz almost 6 years ago

  • Subject changed from [functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine to [sle][functional][u][s390x[kvm] test fails in bootloader_zkvm - "Cannot allocate memory" when instantiating the virtual machine
  • Status changed from New to Feedback
  • Assignee set to mgriessmeier
  • Priority changed from Normal to Low
  • Target version changed from Milestone 25+ to Milestone 23

From duplicate ticket #46337 from mgriessmeier:

the actual issue is here (reading logs helps ;) ):

[2019-01-17T13:10:13.451 CET] [debug] Command executed: virsh  define /var/lib/libvirt/images/openQA-SUT-3.xml
[2019-01-17T13:10:13.588 CET] [debug] Command's stderr:
error: Failed to define domain from /var/lib/libvirt/images/openQA-SUT-3.xml
error: cannot fork child process: Cannot allocate memory

Nick and me already took a look - libvirtd process is consuming a lot of memory, it was running for 130 days, we should track this somehow.
but for now - after restarting libvirtd, the memory usage normalized.

keeping on feedback - low, for tracking

Actions #5

Updated by okurz almost 6 years ago

  • Has duplicate action #46337: [sle][functional][u] test fails in bootloader_zkvm - cannot allocate memory added
Actions #6

Updated by mgriessmeier almost 6 years ago

  • Status changed from Feedback to Resolved

didn't see this anymore after the memory was increased on s390p8

Actions #7

Updated by mgriessmeier almost 6 years ago

  • Status changed from Resolved to In Progress
  • Assignee deleted (mgriessmeier)
  • Priority changed from Low to Normal

happening now on our sle15 kvm machines aka s390p7 hypervisor
https://openqa.suse.de/tests/2494174/modules/bootloader_zkvm/steps/22

Actions #8

Updated by mgriessmeier almost 6 years ago

  • Status changed from In Progress to Workable
Actions #9

Updated by SLindoMansilla almost 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to SLindoMansilla
Actions #10

Updated by mgriessmeier almost 6 years ago

please keep in mind, restarting libvirtd during tests is killing running machines =)
so maybe a cronjob on the host itself?

which might be also helpful is:
http://openqa-monitoring.qa.suse.de:3000/d/AFjd939ik/s390-lpar?panelId=4&fullscreen&orgId=1

Actions #11

Updated by mgriessmeier almost 6 years ago

  • Target version changed from Milestone 23 to Milestone 24

moving to M24

Actions #12

Updated by SLindoMansilla over 5 years ago

  • Target version changed from Milestone 24 to Milestone 25
Actions #13

Updated by SLindoMansilla over 5 years ago

  • Status changed from In Progress to Feedback
Actions #14

Updated by SLindoMansilla over 5 years ago

  • Status changed from Feedback to In Progress

Implementing the same calls for post_fail_hook

Actions #16

Updated by SLindoMansilla over 5 years ago

There are different causes for all of those failures.

Some of them are failing for the cause targeted in this ticket, "cannot allocate memory", which I can see happening in openqaworker5. But, I cannot find any more failure, so, maybe someone already executed systemctl restart libvirtd?

If it happens again "cannot allocate memory", someone with access to that worker has to restart the service.

Actions #17

Updated by coolgw over 5 years ago

Issue trigger again and i found it happen on openqaworker5 again, could we just add one script on this machine and do reboot operation once memory issue happen? So we can workround this.
openqaworker5:6

https://openqa.suse.de/tests/3018758#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018757#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018759#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018753#step/bootloader_zkvm/6
https://openqa.suse.de/tests/3018751#step/bootloader_zkvm/6

Actions #18

Updated by SLindoMansilla over 5 years ago

There were some complaints that this would cause running jobs to fail. But, they should be restarted, right?
I will prepare a PR for that.

Actions #19

Updated by SLindoMansilla over 5 years ago

PR to generate a file with stderr output of virsh command: https://github.com/os-autoinst/os-autoinst/pull/1173

Actions #20

Updated by mgriessmeier over 5 years ago

  • Target version changed from Milestone 25 to Milestone 26
Actions #21

Updated by mgriessmeier over 5 years ago

  • Priority changed from Normal to High

happens more and more - we need to take care of this

Actions #24

Updated by mgriessmeier over 5 years ago

  • Priority changed from High to Urgent
Actions #26

Updated by szarate over 5 years ago

  • Related to action #54248: [functional][u] test fails in boot_to_desktop: SUT is booting from the wrong medium added
Actions #27

Updated by okurz over 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp4_media_sdk_def_full
https://openqa.suse.de/tests/3167151

Actions #28

Updated by okurz over 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: online_sles12sp4_pscc_base_all_minimal_zypp
https://openqa.suse.de/tests/3251956

Actions #29

Updated by coolgw over 5 years ago

@ SLindoMansilla @ szarate @okurz I saw the PR already merged, then the automatic restart can work now or still some part need to be done?

Actions #30

Updated by SLindoMansilla over 5 years ago

  • Status changed from In Progress to Workable

coolgw wrote:

@ SLindoMansilla @ szarate @okurz I saw the PR already merged, then the automatic restart can work now or still some part need to be done?

For the automatic restart, we need:

  1. An agreement to apply the same a approach for the same machines
  2. Create salt states to automatically deploy that task (bash script, cronjob, systemd timer, etc)

This will take long time. foursixnine is working on something while we get to that point. He will automatically collect worker data to show statistics and/or send notifications (including not enough memory), so that a human can react to it, before a QA reviewer take a look at the job.

Actions #31

Updated by mgriessmeier over 5 years ago

  • Target version changed from Milestone 26 to Milestone 27
Actions #32

Updated by SLindoMansilla over 5 years ago

  • Status changed from Workable to In Progress
Actions #36

Updated by okurz over 5 years ago

  • Has duplicate action #56165: qemu-system-s390x: cannot set up guest memory 's390.ram': Cannot allocate memory added
Actions #37

Updated by SLindoMansilla over 5 years ago

  • Priority changed from Urgent to High

What needs to be done?

Actions #39

Updated by SLindoMansilla over 5 years ago

  • Status changed from In Progress to Workable
Actions #40

Updated by mgriessmeier over 5 years ago

  • Target version changed from Milestone 27 to Milestone 28
Actions #41

Updated by okurz over 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp2_ltss_rmt_sdk_all_full_s390x
https://openqa.suse.de/tests/3373611

Actions #42

Updated by mgriessmeier almost 5 years ago

  • Target version changed from Milestone 28 to Milestone 31
Actions #43

Updated by okurz almost 5 years ago

Please see https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/264 where I suggest to remove the auto-restart-libvirtd part again as I am convinced it is completely ineffective in the current form for multiple reasons. Unless the package is used elsewhere I also suggest to remove https://build.opensuse.org/project/show/devel:openSUSE:QA:QSF again to save ressources.

Actions #45

Updated by SLindoMansilla almost 5 years ago

  • Status changed from Workable to Resolved

Resolved:works-for-me
No new evidence of libvirtd with memory allocation problems.

Actions

Also available in: Atom PDF