Project

General

Profile

Actions

action #61967

closed

jobs incomplete with auto_review:"qemu-system-ppc64: Failed to allocate KVM HPT of order.*: Cannot allocate memory"

Added by mkittler about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
-
Start date:
2020-01-09
Due date:
% Done:

0%

Estimated time:

Description

autoinst-log.txt:

[2020-01-09T12:14:30.361 UTC] [debug] running /usr/bin/qemu-img create -f qcow2 /var/lib/openqa/pool/1/raid/hd0 30G
[2020-01-09T12:14:30.411 UTC] [debug] Formatting '/var/lib/openqa/pool/1/raid/hd0', fmt=qcow2 size=32212254720 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2020-01-09T12:14:30.412 UTC] [debug] running /usr/bin/qemu-img create -f qcow2 -b /var/lib/openqa/pool/1/openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20200108-Media.iso /var/lib/openqa/pool/1/raid/cd0-overlay0 4008833024
[2020-01-09T12:14:30.447 UTC] [debug] Formatting '/var/lib/openqa/pool/1/raid/cd0-overlay0', fmt=qcow2 size=4008833024 backing_file=/var/lib/openqa/pool/1/openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20200108-Media.iso cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2020-01-09T12:14:30.447 UTC] [debug] starting: /usr/bin/qemu-system-ppc64 -g 1024x768 -vga std -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw hda -global isa-fdc.driveA= -m 4096 -machine usb=off -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -boot once=d -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 1,threads=1 -enable-kvm -no-shutdown -vnc :91,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-file,filename=/var/lib/openqa/pool/1/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/1/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0

...

[2020-01-09T12:12:03.836 UTC] [debug] Backend process died, backend errors are reported below in the following lines:
can't open qmp at /usr/lib/os-autoinst/OpenQA/Qemu/Proc.pm line 399.

[2020-01-09T12:12:03.837 UTC] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-01-09T12:12:03.837 UTC] [debug] flushing frames
[2020-01-09T12:12:03.839 UTC] [debug] QEMU: QEMU emulator version 3.1.1.1 (openSUSE Leap 15.1)
[2020-01-09T12:12:03.839 UTC] [debug] QEMU: Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers
[2020-01-09T12:12:03.839 UTC] [debug] QEMU: Unknown host!
[2020-01-09T12:12:03.839 UTC] [debug] QEMU: Unknown host!
[2020-01-09T12:12:03.839 UTC] [debug] QEMU: Unknown host!
[2020-01-09T12:12:03.839 UTC] [debug] QEMU: qemu-system-ppc64: Failed to allocate KVM HPT of order 25 (try smaller maxmem?): Cannot allocate memory
[2020-01-09T12:12:03.840 UTC] [debug] sending magic and exit

e.g. https://openqa.opensuse.org/tests/1138082


Related issues 1 (0 open1 closed)

Related to openQA Project - action #63718: incomplete reason with just "quit"/"died" could provide more informationResolvedmkittler2020-02-21

Actions
Actions #1

Updated by okurz about 4 years ago

The VMs are configured for 4GB RAM which is reasonable. power8 is up for 33 days, currently has 7/128GB RAM used so enough of free memory. Checking journalctl --since=2020-01-08 for anything obvious. I see an error from salt-minion that tries to connect to the master on o3 which isn't active. Disabled salt-minion on power8 for now: systemctl disable --now salt-minion.

I see multiple blocks like these:

Jan 09 12:14:27 power8 worker[100859]: [info] 148977: WORKING 1138082
Jan 09 12:14:28 power8 worker[100863]: [info] Test schedule has changed, reloading test_order.json
Jan 09 12:14:30 power8 kernel: alloc_contig_range: 25082 callbacks suppressed
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e5800, 1e5a00) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e5c00, 1e5e00) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e6200, 1e6400) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e6204, 1e6404) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e6208, 1e6408) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e620c, 1e640c) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e6210, 1e6410) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e6214, 1e6414) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e6218, 1e6418) PFNs busy
Jan 09 12:14:30 power8 kernel: alloc_contig_range: [1e621c, 1e641c) PFNs busy
Jan 09 12:14:30 power8 kernel: cma: cma_alloc: alloc failed, req-size: 512 pages, ret: -16
Actions #2

Updated by okurz about 4 years ago

  • Category set to Regressions/Crashes
  • Status changed from New to Rejected
  • Assignee set to okurz

problem wasn't observed again and not reported again. https://openqa.opensuse.org/tests/11524270 and the previous in the same scenario do not show any problem since that one specific day. No actions planned.

Actions #3

Updated by okurz about 4 years ago

  • Status changed from Rejected to New
  • Assignee deleted (okurz)
Actions #4

Updated by okurz about 4 years ago

  • Subject changed from jobs incomplete with auto_review:"qemu-system-ppc64: Failed to allocate KVM HPT of order 25 .*: Cannot allocate memory" to jobs incomplete with auto_review:"qemu-system-ppc64: Failed to allocate KVM HPT of order.*: Cannot allocate memory"
Actions #6

Updated by okurz about 4 years ago

  • Related to action #63718: incomplete reason with just "quit"/"died" could provide more information added
Actions #7

Updated by okurz about 4 years ago

  • Status changed from New to Resolved
  • Assignee set to okurz

problem doesn't happen reproducibly. The machine seems to be fine in general. In principle this is most likely an error caused by test maintainers exceeding the available ressources on the openQA worker machines. However the user feedback will be improved in #63718.

Actions

Also available in: Atom PDF