action #176319
closedcoordination #176337: [saga][epic] Stable os-autoinst backends with stable command execution (no mistyping)
coordination #176340: [epic] Stable qemu backend with no unexpected mistyping
testapi power function call "off" needs to be handled gracefully by os-autoinst size:S
0%
Description
Observation¶
See #175060. TinyCore which we use for testing within os-autoinst does not support handling the ACPI poweroff command so we tried power('off')
which sends the QMP command "quit". That however tears down the complete qemu stack and then os-autoinst looking like this:
[2025-01-29T12:25:20.107196+01:00] [debug] [pid:32470] <<< testapi::power(action="off")
[2025-01-29T12:25:20.108408+01:00] [debug] [pid:32485] EVENT {"data":{"guest":false,"reason":"host-qmp-quit"},"event":"SHUTDOWN","timestamp":{"microseconds":108236,"seconds":1738149920}}
[2025-01-29T12:25:20.142808+01:00] [debug] [pid:32470] tests/shutdown.pm:10 called testapi::assert_shutdown
[2025-01-29T12:25:20.143067+01:00] [debug] [pid:32470] <<< testapi::check_shutdown(timeout=90)
[2025-01-29T12:25:20.146514+01:00] [info] [pid:32485] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
Can't syswrite(IO::Socket::UNIX=GLOB(0x55b38b999888), <BUFFER>): Broken pipe at backend/qemu.pm line 1130
…
[2025-01-29T12:25:22.249780+01:00] [warn] [pid:32460] !!! OpenQA::Isotovideo::Runner::_read_response: THERE IS NOTHING TO READ 18 4 3
Acceptance criteria¶
- AC1: All supported
testapi::power
methods should be usable within os-autoinst test modules without causing isotovideo to crash when just checking for shutdown
Suggestions¶
- Try to fix the usage of
power('off')
. If not possible as the 'off' actually sends "quit" over QMP and then tears down complete qemu consider to remove the implementation
Out of scope¶
Any other testapi method.
Updated by okurz about 2 months ago
- Copied from action #175060: [sporadic] [Workflow] Failed: os-autoinst/openQA on master / test (7dc9d82) size:M added
Updated by okurz about 2 months ago
- Subject changed from testapi power function seems to not work as expected to testapi power function call "off" needs to be handled gracefully by os-autoinst
- Description updated (diff)
- Target version changed from Ready to Tools - Next
Updated by okurz about 2 months ago
- Target version changed from Tools - Next to future
Updated by okurz about 2 months ago
- Target version changed from future to Ready
- Parent task set to #176340
Updated by tinita about 1 month ago
- Subject changed from testapi power function call "off" needs to be handled gracefully by os-autoinst to testapi power function call "off" needs to be handled gracefully by os-autoinst size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 1 month ago
- Category changed from Regressions/Crashes to Feature requests
Updated by ybonatakis about 1 month ago
I took a look today. I got lost but here where I endup:
Run an instance:
qemu-system-x86_64 \
-enable-kvm \
-cdrom t/data/Core-7.2.iso \
-m 1024 \
-cpu host \
-smp 2 \
-serial stdio \
-qmp tcp:0:4444,server,nowait
in another console I get qmp and run the following(included the output:
nc localhost 4444 SIGINT(2) ↵ 451 17:37:58
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 9}, "package": "openSUSE Tumbleweed"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute": "query-status"}
{"return": {"status": "running", "running": true}}
{ "execute": "send-key",
"arguments": { "keys": [ { "type": "qcode", "data": "ctrl" },
{ "type": "qcode", "data": "alt" },
{ "type": "qcode", "data": "delete" } ] } }
{"return": {}}
{"timestamp": {"seconds": 1738601139, "microseconds": 880024}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1738601139, "microseconds": 885425}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
#VM gets boot console
{"execute": "query-status"}
{"return": {"status": "running", "running": true}}
# and after a while boots again
Then I found I can get the available commands[0].
{ "execute": "query-commands" }
{"return": [{"name": "device_add"}, {"name": "cxl-release-dynamic-capacity"}, {"name": "cxl-add-dynamic-capacity"}, {"name": "cxl-inject-correctable-error"}, {"name": "cxl-inject-uncorrectable-errors"}, {"name": "cxl-inject-poison"}, {"name": "cxl-inject-memory-module-event"}, {"name": "cxl-inject-dram-event"}, {"name": "cxl-inject-general-media-event"}, {"name": "query-cryptodev"}, {"name": "x-query-virtio-queue-element"}, {"name": "x-query-virtio-vhost-queue-status"}, {"name": "x-query-virtio-queue-status"}, {"name": "x-query-virtio-status"}, {"name": "x-query-virtio"}, {"name": "query-stats-schemas"}, {"name": "query-stats"}, {"name": "query-pci"}, {"name": "query-acpi-ospm-status"}, {"name": "query-audiodevs"}, {"name": "xen-event-inject"}, {"name": "xen-event-list"}, {"name": "query-sgx-capabilities"}, {"name": "query-sgx"}, {"name": "query-sev-attestation-report"}, {"name": "sev-inject-launch-secret"}, {"name": "query-sev-capabilities"}, {"name": "query-sev-launch-measure"}, {"name": "query-sev"}, {"name": "rtc-reset-reinjection"}, {"name": "query-command-line-options"}, {"name": "query-fdsets"}, {"name": "remove-fd"}, {"name": "add-fd"}, {"name": "closefd"}, {"name": "getfd"}, {"name": "human-monitor-command"}, {"name": "x-exit-preconfig"}, {"name": "cont"}, {"name": "stop"}, {"name": "query-iothreads"}, {"name": "query-name"}, {"name": "add_client"}, {"name": "query-yank"}, {"name": "yank"}, {"name": "replay-seek"}, {"name": "replay-delete-break"}, {"name": "replay-break"}, {"name": "query-replay"}, {"name": "query-cpu-definitions"}, {"name": "query-cpu-model-expansion"}, {"name": "x-query-interrupt-controllers"}, {"name": "dumpdtb"}, {"name": "x-query-usb"}, {"name": "x-query-roms"}, {"name": "x-query-ramblock"}, {"name": "x-query-opcount"}, {"name": "x-query-numa"}, {"name": "x-query-jit"}, {"name": "x-query-irq"}, {"name": "query-memory-devices"}, {"name": "query-memory-size-summary"}, {"name": "query-hv-balloon-status-report"}, {"name": "query-balloon"}, {"name": "balloon"}, {"name": "set-numa-node"}, {"name": "query-hotpluggable-cpus"}, {"name": "query-memdev"}, {"name": "pmemsave"}, {"name": "memsave"}, {"name": "query-kvm"}, {"name": "inject-nmi"}, {"name": "system_wakeup"}, {"name": "system_powerdown"}, {"name": "system_reset"}, {"name": "query-vm-generation-id"}, {"name": "query-uuid"}, {"name": "query-target"}, {"name": "query-current-machine"}, {"name": "query-machines"}, {"name": "query-cpus-fast"}, {"name": "device-sync-config"}, {"name": "device_del"}, {"name": "device-list-properties"}, {"name": "object-del"}, {"name": "object-add"}, {"name": "qom-list-properties"}, {"name": "qom-list-types"}, {"name": "qom-set"}, {"name": "qom-get"}, {"name": "qom-list"}, {"name": "query-qmp-schema"}, {"name": "quit"}, {"name": "query-commands"}, {"name": "query-version"}, {"name": "qmp_capabilities"}, {"name": "trace-event-set-state"}, {"name": "trace-event-get-state"}, {"name": "transaction"}, {"name": "snapshot-delete"}, {"name": "snapshot-load"}, {"name": "snapshot-save"}, {"name": "query-migrationthreads"}, {"name": "query-vcpu-dirty-limit"}, {"name": "cancel-vcpu-dirty-limit"}, {"name": "set-vcpu-dirty-limit"}, {"name": "query-dirty-rate"}, {"name": "calc-dirty-rate"}, {"name": "migrate-pause"}, {"name": "migrate-recover"}, {"name": "query-colo-status"}, {"name": "xen-colo-do-checkpoint"}, {"name": "query-xen-replication-status"}, {"name": "xen-set-replication"}, {"name": "xen-load-devices-state"}, {"name": "xen-set-global-dirty-log"}, {"name": "xen-save-devices-state"}, {"name": "migrate-incoming"}, {"name": "migrate"}, {"name": "migrate-continue"}, {"name": "migrate_cancel"}, {"name": "x-colo-lost-heartbeat"}, {"name": "migrate-start-postcopy"}, {"name": "query-migrate-parameters"}, {"name": "migrate-set-parameters"}, {"name": "query-migrate-capabilities"}, {"name": "migrate-set-capabilities"}, {"name": "query-migrate"}, {"name": "client_migrate_info"}, {"name": "display-update"}, {"name": "display-reload"}, {"name": "query-display-options"}, {"name": "input-send-event"}, {"name": "send-key"}, {"name": "query-mice"}, {"name": "change-vnc-password"}, {"name": "query-vnc-servers"}, {"name": "query-vnc"}, {"name": "query-spice"}, {"name": "screendump"}, {"name": "expire_password"}, {"name": "set_password"}, {"name": "query-tpm"}, {"name": "query-tpm-types"}, {"name": "query-tpm-models"}, {"name": "query-rocker-of-dpa-groups"}, {"name": "query-rocker-of-dpa-flows"}, {"name": "query-rocker-ports"}, {"name": "query-rocker"}, {"name": "request-ebpf"}, {"name": "announce-self"}, {"name": "query-rx-filter"}, {"name": "netdev_del"}, {"name": "netdev_add"}, {"name": "set_link"}, {"name": "query-dump-guest-memory-capability"}, {"name": "query-dump"}, {"name": "dump-guest-memory"}, {"name": "chardev-send-break"}, {"name": "chardev-remove"}, {"name": "chardev-change"}, {"name": "chardev-add"}, {"name": "ringbuf-read"}, {"name": "ringbuf-write"}, {"name": "query-chardev-backends"}, {"name": "query-chardev"}, {"name": "query-block-exports"}, {"name": "block-export-del"}, {"name": "block-export-add"}, {"name": "nbd-server-stop"}, {"name": "nbd-server-remove"}, {"name": "nbd-server-add"}, {"name": "nbd-server-start"}, {"name": "blockdev-snapshot-delete-internal-sync"}, {"name": "blockdev-snapshot-internal-sync"}, {"name": "x-blockdev-set-iothread"}, {"name": "x-blockdev-change"}, {"name": "block-set-write-threshold"}, {"name": "x-blockdev-amend"}, {"name": "blockdev-create"}, {"name": "blockdev-del"}, {"name": "blockdev-reopen"}, {"name": "blockdev-add"}, {"name": "block-job-change"}, {"name": "block-job-finalize"}, {"name": "block-job-dismiss"}, {"name": "block-job-complete"}, {"name": "block-job-resume"}, {"name": "block-job-pause"}, {"name": "block-job-cancel"}, {"name": "block-job-set-speed"}, {"name": "block-stream"}, {"name": "blockdev-mirror"}, {"name": "x-debug-block-dirty-bitmap-sha256"}, {"name": "block-dirty-bitmap-merge"}, {"name": "block-dirty-bitmap-disable"}, {"name": "block-dirty-bitmap-enable"}, {"name": "block-dirty-bitmap-clear"}, {"name": "block-dirty-bitmap-remove"}, {"name": "block-dirty-bitmap-add"}, {"name": "drive-mirror"}, {"name": "x-debug-query-block-graph"}, {"name": "query-named-block-nodes"}, {"name": "blockdev-backup"}, {"name": "drive-backup"}, {"name": "block-commit"}, {"name": "change-backing-file"}, {"name": "blockdev-snapshot"}, {"name": "blockdev-snapshot-sync"}, {"name": "block_resize"}, {"name": "query-block-jobs"}, {"name": "query-blockstats"}, {"name": "query-block"}, {"name": "block-latency-histogram-set"}, {"name": "block_set_io_throttle"}, {"name": "blockdev-change-medium"}, {"name": "blockdev-insert-medium"}, {"name": "blockdev-remove-medium"}, {"name": "blockdev-close-tray"}, {"name": "blockdev-open-tray"}, {"name": "eject"}, {"name": "query-pr-managers"}, {"name": "query-jobs"}, {"name": "job-finalize"}, {"name": "job-dismiss"}, {"name": "job-complete"}, {"name": "job-cancel"}, {"name": "job-resume"}, {"name": "job-pause"}, {"name": "set-action"}, {"name": "watchdog-set-action"}, {"name": "query-status"}]}
If you grep the list I see only "system_powerdown"
, so i run that
{ "execute": "system_powerdown" }
{"timestamp": {"seconds": 1738601890, "microseconds": 906745}, "event": "POWERDOWN"}
{"return": {}}
Nothing seems to be happening...
[0] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1673
Updated by ybonatakis about 1 month ago
no progress so far. I didnt spend much time today with it. I tried to run the same flow as yesterday with a tumbleweed iso. I didnt see power
do what it was expecting. no sure if I did something wrong. I wonder if I should copy the qemu cmd from the CI.
During those experiments I tried to send input-send-event
in various ways. I havent still made it work. I send it with missing parameters or something.
So the question I have now is whether consider to remove the implementation
means delete the power sub!!
Updated by okurz about 1 month ago
ybonatakis wrote in #note-11:
[…]
So the question I have now is whetherconsider to remove the implementation
means delete the power sub!!
no, just the method. But I am not yet convinced that we can't get it to work
Updated by openqa_review about 1 month ago
- Due date set to 2025-02-19
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz about 1 month ago
As discussed in the unblock use power('off')
in t/data/tests/tests/shutdown.pm instead of sudo poweroff
and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.
Updated by livdywan about 1 month ago · Edited
Updated by ybonatakis about 1 month ago · Edited
okurz wrote in #note-14:
As discussed in the unblock use
power('off')
in t/data/tests/tests/shutdown.pm instead ofsudo poweroff
and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.
I need to "study" the pipe's handling.
In the meantime I created a PR to reproduce the error as it was not visible locally. but it seems that it came up -> https://github.com/osh-autoinst/os-autoinst/pull/2646
However, back on my WS I can trigger a restart with the following QMP
{ "execute": "input-send-event",
"arguments": { "events": [
{ "type": "key", "data" : { "down": true,
"key": {"type": "qcode", "data": "ctrl" } } },
{ "type": "key", "data" : { "down": true,
"key": {"type": "qcode", "data": "alt" } } },
{ "type": "key", "data" : { "down": true,
"key": {"type": "qcode", "data": "delete" } } } ] } }
{"return": {}}
This is given exactly as it found in the docs but I think you guys tried this before. would this work in place of off
?
after the system shuts down the boot reappears and then boots again tho.
Updated by ybonatakis about 1 month ago
I tried also this which doesnt work
{
"execute": "input-send-event",
"arguments": {
"events": [
{
"type": "key",
"data": {
"key": {"type": "qcode", "data": "power" },
"down": true
}
},
{
"type": "key",
"data": {
"key": {"type": "qcode", "data": "power" },
"down": false } } ]}}
{"return": {}}
Updated by ybonatakis about 1 month ago
ybonatakis wrote in #note-16:
okurz wrote in #note-14:
As discussed in the unblock use
power('off')
in t/data/tests/tests/shutdown.pm instead ofsudo poweroff
and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.I need to "study" the pipe's handling.
In the meantime I created a PR to reproduce the error as it was not visible locally. but it seems that it came up -> https://github.com/osh-autoinst/os-autoinst/pull/2646
However, back on my WS I can trigger a restart with the following QMP
{ "execute": "input-send-event", "arguments": { "events": [ { "type": "key", "data" : { "down": true, "key": {"type": "qcode", "data": "ctrl" } } }, { "type": "key", "data" : { "down": true, "key": {"type": "qcode", "data": "alt" } } }, { "type": "key", "data" : { "down": true, "key": {"type": "qcode", "data": "delete" } } } ] } } {"return": {}}
I applied this in the power{'off}
and ends with https://github.com/os-autoinst/os-autoinst/actions/runs/13164189124/job/36740193329?pr=2646
This is given exactly as it found in the docs but I think you guys tried this before. would this work in place of
off
?
after the system shuts down the boot reappears and then boots again tho.
Updated by okurz about 1 month ago
My approach: https://github.com/os-autoinst/os-autoinst/pull/2649
Updated by ybonatakis about 1 month ago
- Assignee changed from ybonatakis to okurz
okurz wrote in #note-19:
My approach: https://github.com/os-autoinst/os-autoinst/pull/2649
Thanks for taking care of it. I cant really reiew as I dont fully understand the code. I think it was something I couldnt figure it out by myself. I assigned the ticket to you
Updated by okurz about 1 month ago
- Blocked by action #176475: Use Feature::Compat::Try in our code - os-autoinst size:S added
Updated by okurz about 1 month ago · Edited
- Status changed from In Progress to Blocked
Based on https://github.com/os-autoinst/os-autoinst/pull/2649#pullrequestreview-2601866232 I am preparing some changes to bring in Feature::Compat::Try and get rid of some older suboptimal code. I plan to continue here afterwards so that we have a good style for catching exception going forward before I introduce more as necessary for this PR.
Next PRs:
- https://github.com/os-autoinst/os-autoinst/pull/2654 (merged)
- https://github.com/os-autoinst/os-autoinst/pull/2653 (merged)
- https://github.com/os-autoinst/os-autoinst/pull/2652 (merged)
- https://github.com/os-autoinst/os-autoinst/pull/2651 (merged)
- https://github.com/os-autoinst/os-autoinst/pull/2650 (merged)
most are handled in #176475
Updated by livdywan 16 days ago
- Priority changed from Low to Normal
We can't lower priority so long as this blocks another ticket with higher priority. See #175060#note-38
Updated by okurz 12 days ago
- Assignee deleted (
okurz)
All prerequisities about better exception handling as noted in #176319-22 are done
We could progress with https://github.com/os-autoinst/os-autoinst/pull/2649 but the main point that we would like to solve to prevent some caveats is https://github.com/os-autoinst/os-autoinst/pull/2649#issuecomment-2671814488
The case when autotest takes long enough to terminate to still receive SIGTERM from isotovideo is not handled correctly. So far this change would lead to an incomplete test in this case (unless @okurz has already fixed this since our last session). Maybe autotest should simply avoid sending SIGTERM if it passed a power off command. Otherwise autotest could consider the SIGTERM "expected" in case it previously send a power off command.
Updated by openqa_review 5 days ago
- Due date set to 2025-03-25
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 2 days ago
- Due date deleted (
2025-03-25) - Status changed from In Progress to Resolved
All included with https://github.com/os-autoinst/os-autoinst/pull/2676 which was merged so we are good. Good job! This will be used automatically within openQA soon I guess. Then we can verify that the openQA test works stable