Project

General

Profile

Actions

action #19012

closed

[tools] test gic-version=3 and its=off on the caivum thunderx

Added by okurz almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2017-05-08
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario sle-12-SP3-Server-DVD-aarch64-cryptlvm+activate_existing+import_users@aarch64 fails with incomplete after
boot_encrypt trying to save a memory dump. log file content:

21:45:35.5310 8746 >>> testapi::wait_screen_change: timed out
21:45:35.5313 8746 Save memory dump to debug bootup problems, e.g. for bsc#1005313
21:45:35.5315 Debug: /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/first_boot.pm:72 called testapi::save_memory_dump
21:45:35.5319 8746 <<< testapi::save_memory_dump(filename='first_boot')
21:45:35.5320 8746 Trying to save machine state
21:45:35.5335 8748 Migrating the machine.
21:45:35.5342 8748 EVENT {"event":"NIC_RX_FILTER_CHANGED","data":{"path":"/machine/peripheral-anon/device[1]/virtio-backend"},"timestamp":{"microseconds":321504,"seconds":1494019343}}
21:45:35.5346 8748 EVENT {"data":{"server":{"host":"0.0.0.0","service":"5991","websocket":false,"auth":"none","family":"ipv4"},"client":{"host":"127.0.0.1","service":"37174","websocket":false,"family":"ipv4"}},"timestamp":{"microseconds":644277,"seconds":1494019422},"event":"VNC_DISCONNECTED"}
21:45:35.5348 8748 EVENT {"timestamp":{"microseconds":645124,"seconds":1494019422},"data":{"server":{"host":"0.0.0.0","service":"5991","websocket":false,"family":"ipv4","auth":"none"},"client":{"family":"ipv4","websocket":false,"service":"37968","host":"127.0.0.1"}},"event":"VNC_CONNECTED"}
21:45:35.5350 8748 EVENT {"timestamp":{"seconds":1494019422,"microseconds":656703},"data":{"server":{"family":"ipv4","auth":"none","websocket":false,"service":"5991","host":"0.0.0.0"},"client":{"service":"37968","host":"127.0.0.1","family":"ipv4","websocket":false}},"event":"VNC_INITIALIZED"}
21:45:35.5353 8748 EVENT {"timestamp":{"microseconds":958315,"seconds":1494020298},"data":{"server":{"websocket":false,"family":"ipv4","auth":"none","host":"0.0.0.0","service":"5991"},"client":{"family":"ipv4","websocket":false,"service":"37968","host":"127.0.0.1"}},"event":"VNC_DISCONNECTED"}
21:45:35.5355 8748 EVENT {"data":{"client":{"host":"127.0.0.1","service":"41462","websocket":false,"family":"ipv4"},"server":{"host":"0.0.0.0","service":"5991","websocket":false,"family":"ipv4","auth":"none"}},"timestamp":{"seconds":1494020298,"microseconds":958863},"event":"VNC_CONNECTED"}
21:45:35.5357 8748 EVENT {"event":"VNC_INITIALIZED","timestamp":{"seconds":1494020298,"microseconds":965817},"data":{"server":{"auth":"none","family":"ipv4","websocket":false,"service":"5991","host":"0.0.0.0"},"client":{"host":"127.0.0.1","service":"41462","websocket":false,"family":"ipv4"}}}
21:45:35.5360 8748 EVENT {"event":"VNC_DISCONNECTED","timestamp":{"microseconds":326005,"seconds":1494020345},"data":{"server":{"service":"5991","host":"0.0.0.0","auth":"none","family":"ipv4","websocket":false},"client":{"family":"ipv4","websocket":false,"service":"41462","host":"127.0.0.1"}}}
21:45:35.5362 8748 EVENT {"data":{"client":{"websocket":false,"family":"ipv4","host":"127.0.0.1","service":"41772"},"server":{"family":"ipv4","auth":"none","websocket":false,"service":"5991","host":"0.0.0.0"}},"timestamp":{"microseconds":326765,"seconds":1494020345},"event":"VNC_CONNECTED"}
21:45:35.5364 8748 EVENT {"data":{"client":{"websocket":false,"family":"ipv4","host":"127.0.0.1","service":"41772"},"server":{"family":"ipv4","auth":"none","websocket":false,"service":"5991","host":"0.0.0.0"}},"timestamp":{"microseconds":342342,"seconds":1494020345},"event":"VNC_INITIALIZED"}
21:45:35.5367 8748 EVENT {"event":"VNC_DISCONNECTED","data":{"client":{"host":"127.0.0.1","service":"41772","websocket":false,"family":"ipv4"},"server":{"auth":"none","family":"ipv4","websocket":false,"service":"5991","host":"0.0.0.0"}},"timestamp":{"seconds":1494020493,"microseconds":811602}}
21:45:35.5369 8748 EVENT {"event":"VNC_CONNECTED","data":{"client":{"family":"ipv4","websocket":false,"service":"42626","host":"127.0.0.1"},"server":{"websocket":false,"auth":"none","family":"ipv4","host":"0.0.0.0","service":"5991"}},"timestamp":{"microseconds":812345,"seconds":1494020493}}
21:45:35.5371 8748 EVENT {"event":"VNC_INITIALIZED","timestamp":{"microseconds":823717,"seconds":1494020493},"data":{"server":{"service":"5991","host":"0.0.0.0","auth":"none","family":"ipv4","websocket":false},"client":{"service":"42626","host":"127.0.0.1","family":"ipv4","websocket":false}}}
21:45:35.5373 8748 EVENT {"timestamp":{"microseconds":762036,"seconds":1494020636},"event":"RESET"}
DIE Migration failed: desc: State blocked by non-migratable device 'arm_gicv3_its', class: GenericError, stopped at /usr/lib/os-autoinst/backend/qemu.pm line 174.

 at /usr/lib/os-autoinst/backend/baseclass.pm line 73.
    backend::baseclass::die_handler('Migration failed: desc: State blocked by non-migratable devic...') called at /usr/lib/os-autoinst/backend/qemu.pm line 174
    backend::qemu::save_memory_dump('backend::qemu=HASH(0x323730a8)', 'HASH(0x328a90e8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 68
    backend::baseclass::handle_command('backend::qemu=HASH(0x323730a8)', 'HASH(0x328ad208)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 422
    backend::baseclass::check_socket('backend::qemu=HASH(0x323730a8)', 'IO::Handle=GLOB(0x322dcdb0)') called at /usr/lib/os-autoinst/backend/qemu.pm line 1018
    backend::qemu::check_socket('backend::qemu=HASH(0x323730a8)', 'IO::Handle=GLOB(0x322dcdb0)', 0) called at /usr/lib/os-autoinst/backend/baseclass.pm line 203
    eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 151
    backend::baseclass::run_capture_loop('backend::qemu=HASH(0x323730a8)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 122
    backend::baseclass::run('backend::qemu=HASH(0x323730a8)', 6, 9) called at /usr/lib/os-autoinst/backend/driver.pm line 85
    backend::driver::start('backend::driver=HASH(0x2e6a5cc8)') called at /usr/lib/os-autoinst/backend/driver.pm line 48
    backend::driver::new('backend::driver', 'qemu') called at /usr/bin/isotovideo line 206
    main::init_backend() called at /usr/bin/isotovideo line 271

Reproducible

  • Make test fail when trying to boot using "first_boot"
  • openQA should try to save the memory dump
  • Observe the error

Expected result

Last good: Probably some months ago on overdrive2 with still the old qemu version

Further details

Always latest result in this scenario: latest

Actions #1

Updated by okurz almost 7 years ago

  • Project changed from openQA Tests to openQA Project
  • Subject changed from aarch64 fails to save memory dump because of "non-migratable device 'arm_gicv3_its'" to [tools][aarch64] fails to save memory dump because of "non-migratable device 'arm_gicv3_its'"
  • Category changed from Infrastructure to 132
Actions #2

Updated by szarate almost 7 years ago

  • Subject changed from [tools][aarch64] fails to save memory dump because of "non-migratable device 'arm_gicv3_its'" to [tools][aarch64] test gic-version=3 and its=off on the caivum thunderx
  • Target version set to Milestone 7

overdrive2 should not run jobs for sp3.

This job ran on openqaworker-arm-2, jobs on that worker need to run with its=off which has the new patches that should remove the RCU stalls that don't allow us to use the thunderx's as workers.

Actions #4

Updated by RBrownSUSE almost 7 years ago

  • Assignee set to nicksinger

Nick's looking at it as part of his aarch64 investigations

Actions #5

Updated by nicksinger almost 7 years ago

  • Subject changed from [tools][aarch64] test gic-version=3 and its=off on the caivum thunderx to [tools] save_memory_dump fails if some qemu-devices are not migratable

I'm somehow confused by this ticket.
The title states to test gic-version=3 and its=off but the whole description is about a non-miratable device which only exists because of the parameter "gic-version=3" (the device in question is called "arm_gicv3_its"…).
Testing gic-version=3 and its=off is covered by poo#17740 now (new bullet point).

I'll change this ticket to be way more general because I think it covers an interesting point (which is neither related to aarch64 nor to the caviums): openQA's memory dump routine fails if some qemu-device is not migratable which can happen from time to time (like in this case for example).

Actions #6

Updated by nicksinger almost 7 years ago

  • Assignee deleted (nicksinger)
Actions #7

Updated by okurz almost 7 years ago

wenn you read #19012#note-2 that should resolve your confusion. szarate updated the subject line. I am wondering if there is not a ticket which is blocking this one here?

Actions #12

Updated by nicksinger almost 7 years ago

reading note-2 just enables me to finger-point on santi but does not explain his weighty reasons ;)
Which ticket should block this ticket exactly? poo#17740 (and its changes) only cause this problem on aarch64 (so one could argue that this is raised and blocked by poo#17740). But as I said in my previous comment I think that this issue is way more generic and should be catched regardless of the architecture.

Actions #13

Updated by EDiGiacinto almost 7 years ago

which version of qemu was used? The qmp answer from the api looks like migration on aarch64 and VGICv3 is not supported yet by the qemu/kernel version, and if qemu is 2.8, probably that's the case.

In the current implementation [1] if qmp returns an error we let the test die. As a mitigation we could decide to not let the test stop and continue without snapshotting capabilities.

VGICv3 aarch64 migration support on linux kernel seems have been updated lately with a new round of patches [2], but qemu VGICv3 save/restore support should be there only in versions >=2.9 [3], while linux kernel from 4.8+ [4]

[1] https://github.com/os-autoinst/os-autoinst/blob/master/backend/qemu.pm#L177
[2] https://www.spinics.net/lists/arm-kernel/msg558046.html
[3] https://github.com/qemu/qemu/commit/b28f9db1a7ce4d537ce2fae6fbce5e5e37dc265b
[4] https://lkml.org/lkml/2016/10/6/450

Actions #14

Updated by szarate almost 7 years ago

  • Status changed from New to In Progress
Actions #15

Updated by szarate almost 7 years ago

  • Subject changed from [tools] save_memory_dump fails if some qemu-devices are not migratable to [tools] test gic-version=3 and its=off on the caivum thunderx
  • Status changed from In Progress to Resolved

This issue was specifically about the problems with gicv3 and its migration support on qemu not being possible... so we tested with qemu 2.9 (From SP3) on the caviums that are the only systems in our openQA infraestructure that actually require this due to the RCU stalls.

As Ettore is pointing out, we already die when the snapshot can't be created, and I don't see any reason to actually try to recover from that situation.

The tests were specifically to ensure that the variation of the environment would still allow the thunderX to run tests (Which it did, we could run tests and do snapshots with ITS disabled regardless of the status of the tests.)

So for now, I'm marking this as solved. :). As Running tests with GICv3 and ITS off, allows us to create snapshots on qemu 2.9.

Actions

Also available in: Atom PDF