Project

General

Profile

Actions

action #167908

open

coordination #154768: [saga][epic][ux] State-of-art user experience for openQA

coordination #166556: [epic] Improved test reviewer user experience - Restart filtered jobs from /tests/overview

[tools][xen] openQA didn't show clear error messages when starting a VM failed due to "Cannot allocate memory"

Added by rfan1 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Description

xen host unreal6 has only 16GB phy memory, and dom0 occupies 3GB. at the same time, we have 8 workers assigned to this host. so sometime, we may not able to start a vm
if no enough memory.

However, from openQA failed job, I can't see clear error messages. https://openqa.suse.de/tests/15403428#step/bootloader_svirt/38

It only reports that virsh start failed: 1

When checking the autoinst log, I can get below error messages, but still not clear enough.

[2024-09-23T14:40:15.620628Z] [debug] [pid:9354] Using existing SSH connection (key:hostname=unreal6.qe.nue2.suse.org,username=root,port=22)
[2024-09-23T14:40:16.199266Z] [debug] [pid:9354] [run_ssh_cmd(virsh  start openQA-SUT-3 2> >(tee /tmp/os-autoinst-openQA-SUT-3-stderr.log >&2))] stdout:


[2024-09-23T14:40:16.199401Z] [debug] [pid:9354] [run_ssh_cmd(virsh  start openQA-SUT-3 2> >(tee /tmp/os-autoinst-openQA-SUT-3-stderr.log >&2))] stderr:
  error: Failed to start domain 'openQA-SUT-3'
  error: internal error: libxenlight failed to create new domain 'openQA-SUT-3'

When I tried to check the logs from xen host, I can catch more clear error messages like below:

libxl-driver.log:19269:2024-10-05 02:07:28.986+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19280:2024-10-05 02:18:19.610+0000: xc: panic: xg_dom_x86.c:1316: meminit_pv: failed to allocate 0x80000 pages: Internal error
libxl-driver.log:19282:2024-10-05 02:18:19.610+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19351:2024-10-05 06:27:36.180+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19355:2024-10-05 06:27:38.275+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19361:2024-10-05 06:27:44.722+0000: xc: panic: xg_dom_x86.c:1316: meminit_pv: failed to allocate 0x100000 pages: Internal error
libxl-driver.log:19363:2024-10-05 06:27:44.722+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19367:2024-10-05 06:27:45.212+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19370:2024-10-05 06:29:19.841+0000: xc: panic: xg_dom_x86.c:1316: meminit_pv: failed to allocate 0x100000 pages: Internal error
libxl-driver.log:19372:2024-10-05 06:29:19.841+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19376:2024-10-05 06:30:09.525+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19380:2024-10-05 06:30:09.653+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19384:2024-10-05 06:30:09.847+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19387:2024-10-05 06:30:09.854+0000: libxl: libxl_create.c:720:libxl__domain_make: domain creation fail: Cannot allocate memory
libxl-driver.log:19390:2024-10-05 06:32:16.124+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19400:2024-10-05 06:48:48.446+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19408:2024-10-05 06:51:05.700+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19446:2024-10-05 08:52:23.451+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19450:2024-10-05 08:52:47.094+0000: libxl: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init failed: Device or resource busy
libxl-driver.log:19453:2024-10-05 08:52:47.101+0000: libxl: libxl_create.c:720:libxl__domain_make: domain creation fail: Cannot allocate memory

So, it could be great if openQA can report clear information when starting a VM fails.

Observation

openQA test in scenario sle-15-SP7-Online-x86_64-create_hdd_gnome@svirt-xen-hvm fails in
bootloader_svirt

Test suite description

image creation job used as parent for other jobs testing based on existing installation. To be used as START_AFTER_TEST=create_hdd_gnome

Reproducible

Fails since (at least) Build 18.1 (current job)

Expected result

Last good: 14.1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Copied from openQA Tests (public) - action #167317: [qe-core][sle15sp7]sporadic issue to start a xen vmResolvedrfan12024-09-25

Actions
Copied to openQA Tests (public) - action #167956: [tools][xen][unreal6] increase phy memory size for xen host unreal6 (or reduce worker instances)Resolvedokurz

Actions
Actions #1

Updated by rfan1 2 months ago

  • Copied from action #167317: [qe-core][sle15sp7]sporadic issue to start a xen vm added
Actions #2

Updated by rfan1 2 months ago

  • Copied to action #167956: [tools][xen][unreal6] increase phy memory size for xen host unreal6 (or reduce worker instances) added
Actions #3

Updated by okurz 2 months ago

  • Project changed from openQA Tests (public) to openQA Project (public)
  • Category changed from Bugs in existing tests to Regressions/Crashes
  • Target version set to Ready
  • Parent task set to #166556
Actions #4

Updated by okurz 2 months ago

  • Target version changed from Ready to future
Actions

Also available in: Atom PDF