action #176319
opencoordination #176337: [saga][epic] Stable os-autoinst backends with stable command execution (no mistyping)
coordination #176340: [epic] Stable qemu backend with no unexpected mistyping
testapi power function call "off" needs to be handled gracefully by os-autoinst size:S
0%
Description
Observation¶
See #175060. TinyCore which we use for testing within os-autoinst does not support handling the ACPI poweroff command so we tried power('off')
which sends the QMP command "quit". That however tears down the complete qemu stack and then os-autoinst looking like this:
[2025-01-29T12:25:20.107196+01:00] [debug] [pid:32470] <<< testapi::power(action="off")
[2025-01-29T12:25:20.108408+01:00] [debug] [pid:32485] EVENT {"data":{"guest":false,"reason":"host-qmp-quit"},"event":"SHUTDOWN","timestamp":{"microseconds":108236,"seconds":1738149920}}
[2025-01-29T12:25:20.142808+01:00] [debug] [pid:32470] tests/shutdown.pm:10 called testapi::assert_shutdown
[2025-01-29T12:25:20.143067+01:00] [debug] [pid:32470] <<< testapi::check_shutdown(timeout=90)
[2025-01-29T12:25:20.146514+01:00] [info] [pid:32485] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
Can't syswrite(IO::Socket::UNIX=GLOB(0x55b38b999888), <BUFFER>): Broken pipe at backend/qemu.pm line 1130
…
[2025-01-29T12:25:22.249780+01:00] [warn] [pid:32460] !!! OpenQA::Isotovideo::Runner::_read_response: THERE IS NOTHING TO READ 18 4 3
Acceptance criteria¶
- AC1: All supported
testapi::power
methods should be usable within os-autoinst test modules without causing isotovideo to crash when just checking for shutdown
Suggestions¶
- Try to fix the usage of
power('off')
. If not possible as the 'off' actually sends "quit" over QMP and then tears down complete qemu consider to remove the implementation
Out of scope¶
Any other testapi method.
Updated by okurz 2 months ago
- Copied from action #175060: [sporadic] [Workflow] Failed: os-autoinst/openQA on master / test (7dc9d82) size:M added
Updated by ybonatakis about 2 months ago
I took a look today. I got lost but here where I endup:
Run an instance:
qemu-system-x86_64 \
-enable-kvm \
-cdrom t/data/Core-7.2.iso \
-m 1024 \
-cpu host \
-smp 2 \
-serial stdio \
-qmp tcp:0:4444,server,nowait
in another console I get qmp and run the following(included the output:
nc localhost 4444 SIGINT(2) ↵ 451 17:37:58
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 9}, "package": "openSUSE Tumbleweed"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute": "query-status"}
{"return": {"status": "running", "running": true}}
{ "execute": "send-key",
"arguments": { "keys": [ { "type": "qcode", "data": "ctrl" },
{ "type": "qcode", "data": "alt" },
{ "type": "qcode", "data": "delete" } ] } }
{"return": {}}
{"timestamp": {"seconds": 1738601139, "microseconds": 880024}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1738601139, "microseconds": 885425}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
#VM gets boot console
{"execute": "query-status"}
{"return": {"status": "running", "running": true}}
# and after a while boots again
Then I found I can get the available commands[0].
{ "execute": "query-commands" }
{"return": [{"name": "device_add"}, {"name": "cxl-release-dynamic-capacity"}, {"name": "cxl-add-dynamic-capacity"}, {"name": "cxl-inject-correctable-error"}, {"name": "cxl-inject-uncorrectable-errors"}, {"name": "cxl-inject-poison"}, {"name": "cxl-inject-memory-module-event"}, {"name": "cxl-inject-dram-event"}, {"name": "cxl-inject-general-media-event"}, {"name": "query-cryptodev"}, {"name": "x-query-virtio-queue-element"}, {"name": "x-query-virtio-vhost-queue-status"}, {"name": "x-query-virtio-queue-status"}, {"name": "x-query-virtio-status"}, {"name": "x-query-virtio"}, {"name": "query-stats-schemas"}, {"name": "query-stats"}, {"name": "query-pci"}, {"name": "query-acpi-ospm-status"}, {"name": "query-audiodevs"}, {"name": "xen-event-inject"}, {"name": "xen-event-list"}, {"name": "query-sgx-capabilities"}, {"name": "query-sgx"}, {"name": "query-sev-attestation-report"}, {"name": "sev-inject-launch-secret"}, {"name": "query-sev-capabilities"}, {"name": "query-sev-launch-measure"}, {"name": "query-sev"}, {"name": "rtc-reset-reinjection"}, {"name": "query-command-line-options"}, {"name": "query-fdsets"}, {"name": "remove-fd"}, {"name": "add-fd"}, {"name": "closefd"}, {"name": "getfd"}, {"name": "human-monitor-command"}, {"name": "x-exit-preconfig"}, {"name": "cont"}, {"name": "stop"}, {"name": "query-iothreads"}, {"name": "query-name"}, {"name": "add_client"}, {"name": "query-yank"}, {"name": "yank"}, {"name": "replay-seek"}, {"name": "replay-delete-break"}, {"name": "replay-break"}, {"name": "query-replay"}, {"name": "query-cpu-definitions"}, {"name": "query-cpu-model-expansion"}, {"name": "x-query-interrupt-controllers"}, {"name": "dumpdtb"}, {"name": "x-query-usb"}, {"name": "x-query-roms"}, {"name": "x-query-ramblock"}, {"name": "x-query-opcount"}, {"name": "x-query-numa"}, {"name": "x-query-jit"}, {"name": "x-query-irq"}, {"name": "query-memory-devices"}, {"name": "query-memory-size-summary"}, {"name": "query-hv-balloon-status-report"}, {"name": "query-balloon"}, {"name": "balloon"}, {"name": "set-numa-node"}, {"name": "query-hotpluggable-cpus"}, {"name": "query-memdev"}, {"name": "pmemsave"}, {"name": "memsave"}, {"name": "query-kvm"}, {"name": "inject-nmi"}, {"name": "system_wakeup"}, {"name": "system_powerdown"}, {"name": "system_reset"}, {"name": "query-vm-generation-id"}, {"name": "query-uuid"}, {"name": "query-target"}, {"name": "query-current-machine"}, {"name": "query-machines"}, {"name": "query-cpus-fast"}, {"name": "device-sync-config"}, {"name": "device_del"}, {"name": "device-list-properties"}, {"name": "object-del"}, {"name": "object-add"}, {"name": "qom-list-properties"}, {"name": "qom-list-types"}, {"name": "qom-set"}, {"name": "qom-get"}, {"name": "qom-list"}, {"name": "query-qmp-schema"}, {"name": "quit"}, {"name": "query-commands"}, {"name": "query-version"}, {"name": "qmp_capabilities"}, {"name": "trace-event-set-state"}, {"name": "trace-event-get-state"}, {"name": "transaction"}, {"name": "snapshot-delete"}, {"name": "snapshot-load"}, {"name": "snapshot-save"}, {"name": "query-migrationthreads"}, {"name": "query-vcpu-dirty-limit"}, {"name": "cancel-vcpu-dirty-limit"}, {"name": "set-vcpu-dirty-limit"}, {"name": "query-dirty-rate"}, {"name": "calc-dirty-rate"}, {"name": "migrate-pause"}, {"name": "migrate-recover"}, {"name": "query-colo-status"}, {"name": "xen-colo-do-checkpoint"}, {"name": "query-xen-replication-status"}, {"name": "xen-set-replication"}, {"name": "xen-load-devices-state"}, {"name": "xen-set-global-dirty-log"}, {"name": "xen-save-devices-state"}, {"name": "migrate-incoming"}, {"name": "migrate"}, {"name": "migrate-continue"}, {"name": "migrate_cancel"}, {"name": "x-colo-lost-heartbeat"}, {"name": "migrate-start-postcopy"}, {"name": "query-migrate-parameters"}, {"name": "migrate-set-parameters"}, {"name": "query-migrate-capabilities"}, {"name": "migrate-set-capabilities"}, {"name": "query-migrate"}, {"name": "client_migrate_info"}, {"name": "display-update"}, {"name": "display-reload"}, {"name": "query-display-options"}, {"name": "input-send-event"}, {"name": "send-key"}, {"name": "query-mice"}, {"name": "change-vnc-password"}, {"name": "query-vnc-servers"}, {"name": "query-vnc"}, {"name": "query-spice"}, {"name": "screendump"}, {"name": "expire_password"}, {"name": "set_password"}, {"name": "query-tpm"}, {"name": "query-tpm-types"}, {"name": "query-tpm-models"}, {"name": "query-rocker-of-dpa-groups"}, {"name": "query-rocker-of-dpa-flows"}, {"name": "query-rocker-ports"}, {"name": "query-rocker"}, {"name": "request-ebpf"}, {"name": "announce-self"}, {"name": "query-rx-filter"}, {"name": "netdev_del"}, {"name": "netdev_add"}, {"name": "set_link"}, {"name": "query-dump-guest-memory-capability"}, {"name": "query-dump"}, {"name": "dump-guest-memory"}, {"name": "chardev-send-break"}, {"name": "chardev-remove"}, {"name": "chardev-change"}, {"name": "chardev-add"}, {"name": "ringbuf-read"}, {"name": "ringbuf-write"}, {"name": "query-chardev-backends"}, {"name": "query-chardev"}, {"name": "query-block-exports"}, {"name": "block-export-del"}, {"name": "block-export-add"}, {"name": "nbd-server-stop"}, {"name": "nbd-server-remove"}, {"name": "nbd-server-add"}, {"name": "nbd-server-start"}, {"name": "blockdev-snapshot-delete-internal-sync"}, {"name": "blockdev-snapshot-internal-sync"}, {"name": "x-blockdev-set-iothread"}, {"name": "x-blockdev-change"}, {"name": "block-set-write-threshold"}, {"name": "x-blockdev-amend"}, {"name": "blockdev-create"}, {"name": "blockdev-del"}, {"name": "blockdev-reopen"}, {"name": "blockdev-add"}, {"name": "block-job-change"}, {"name": "block-job-finalize"}, {"name": "block-job-dismiss"}, {"name": "block-job-complete"}, {"name": "block-job-resume"}, {"name": "block-job-pause"}, {"name": "block-job-cancel"}, {"name": "block-job-set-speed"}, {"name": "block-stream"}, {"name": "blockdev-mirror"}, {"name": "x-debug-block-dirty-bitmap-sha256"}, {"name": "block-dirty-bitmap-merge"}, {"name": "block-dirty-bitmap-disable"}, {"name": "block-dirty-bitmap-enable"}, {"name": "block-dirty-bitmap-clear"}, {"name": "block-dirty-bitmap-remove"}, {"name": "block-dirty-bitmap-add"}, {"name": "drive-mirror"}, {"name": "x-debug-query-block-graph"}, {"name": "query-named-block-nodes"}, {"name": "blockdev-backup"}, {"name": "drive-backup"}, {"name": "block-commit"}, {"name": "change-backing-file"}, {"name": "blockdev-snapshot"}, {"name": "blockdev-snapshot-sync"}, {"name": "block_resize"}, {"name": "query-block-jobs"}, {"name": "query-blockstats"}, {"name": "query-block"}, {"name": "block-latency-histogram-set"}, {"name": "block_set_io_throttle"}, {"name": "blockdev-change-medium"}, {"name": "blockdev-insert-medium"}, {"name": "blockdev-remove-medium"}, {"name": "blockdev-close-tray"}, {"name": "blockdev-open-tray"}, {"name": "eject"}, {"name": "query-pr-managers"}, {"name": "query-jobs"}, {"name": "job-finalize"}, {"name": "job-dismiss"}, {"name": "job-complete"}, {"name": "job-cancel"}, {"name": "job-resume"}, {"name": "job-pause"}, {"name": "set-action"}, {"name": "watchdog-set-action"}, {"name": "query-status"}]}
If you grep the list I see only "system_powerdown"
, so i run that
{ "execute": "system_powerdown" }
{"timestamp": {"seconds": 1738601890, "microseconds": 906745}, "event": "POWERDOWN"}
{"return": {}}
Nothing seems to be happening...
[0] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1673
Updated by okurz about 2 months ago
- Status changed from Workable to In Progress
Updated by ybonatakis about 2 months ago
no progress so far. I didnt spend much time today with it. I tried to run the same flow as yesterday with a tumbleweed iso. I didnt see power
do what it was expecting. no sure if I did something wrong. I wonder if I should copy the qemu cmd from the CI.
During those experiments I tried to send input-send-event
in various ways. I havent still made it work. I send it with missing parameters or something.
So the question I have now is whether consider to remove the implementation
means delete the power sub!!
Updated by okurz about 2 months ago
ybonatakis wrote in #note-11:
[…]
So the question I have now is whetherconsider to remove the implementation
means delete the power sub!!
no, just the method. But I am not yet convinced that we can't get it to work
Updated by openqa_review about 2 months ago
- Due date set to 2025-02-19
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz about 2 months ago
As discussed in the unblock use power('off')
in t/data/tests/tests/shutdown.pm instead of sudo poweroff
and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.
Updated by livdywan about 2 months ago · Edited
Updated by ybonatakis about 2 months ago · Edited
okurz wrote in #note-14:
As discussed in the unblock use
power('off')
in t/data/tests/tests/shutdown.pm instead ofsudo poweroff
and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.
I need to "study" the pipe's handling.
In the meantime I created a PR to reproduce the error as it was not visible locally. but it seems that it came up -> https://github.com/osh-autoinst/os-autoinst/pull/2646
However, back on my WS I can trigger a restart with the following QMP
{ "execute": "input-send-event",
"arguments": { "events": [
{ "type": "key", "data" : { "down": true,
"key": {"type": "qcode", "data": "ctrl" } } },
{ "type": "key", "data" : { "down": true,
"key": {"type": "qcode", "data": "alt" } } },
{ "type": "key", "data" : { "down": true,
"key": {"type": "qcode", "data": "delete" } } } ] } }
{"return": {}}
This is given exactly as it found in the docs but I think you guys tried this before. would this work in place of off
?
after the system shuts down the boot reappears and then boots again tho.
Updated by ybonatakis about 2 months ago
I tried also this which doesnt work
{
"execute": "input-send-event",
"arguments": {
"events": [
{
"type": "key",
"data": {
"key": {"type": "qcode", "data": "power" },
"down": true
}
},
{
"type": "key",
"data": {
"key": {"type": "qcode", "data": "power" },
"down": false } } ]}}
{"return": {}}
Updated by ybonatakis about 2 months ago
ybonatakis wrote in #note-16:
okurz wrote in #note-14:
As discussed in the unblock use
power('off')
in t/data/tests/tests/shutdown.pm instead ofsudo poweroff
and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.I need to "study" the pipe's handling.
In the meantime I created a PR to reproduce the error as it was not visible locally. but it seems that it came up -> https://github.com/osh-autoinst/os-autoinst/pull/2646
However, back on my WS I can trigger a restart with the following QMP
{ "execute": "input-send-event", "arguments": { "events": [ { "type": "key", "data" : { "down": true, "key": {"type": "qcode", "data": "ctrl" } } }, { "type": "key", "data" : { "down": true, "key": {"type": "qcode", "data": "alt" } } }, { "type": "key", "data" : { "down": true, "key": {"type": "qcode", "data": "delete" } } } ] } } {"return": {}}
I applied this in the power{'off}
and ends with https://github.com/os-autoinst/os-autoinst/actions/runs/13164189124/job/36740193329?pr=2646
This is given exactly as it found in the docs but I think you guys tried this before. would this work in place of
off
?
after the system shuts down the boot reappears and then boots again tho.
Updated by okurz about 2 months ago
My approach: https://github.com/os-autoinst/os-autoinst/pull/2649
Updated by ybonatakis about 2 months ago
- Assignee changed from ybonatakis to okurz
okurz wrote in #note-19:
My approach: https://github.com/os-autoinst/os-autoinst/pull/2649
Thanks for taking care of it. I cant really reiew as I dont fully understand the code. I think it was something I couldnt figure it out by myself. I assigned the ticket to you
Updated by okurz about 2 months ago
- Blocked by action #176475: Use Feature::Compat::Try in our code - os-autoinst size:S added
Updated by okurz about 2 months ago · Edited
- Status changed from In Progress to Blocked
Based on https://github.com/os-autoinst/os-autoinst/pull/2649#pullrequestreview-2601866232 I am preparing some changes to bring in Feature::Compat::Try and get rid of some older suboptimal code. I plan to continue here afterwards so that we have a good style for catching exception going forward before I introduce more as necessary for this PR.
Next PRs:
- https://github.com/os-autoinst/os-autoinst/pull/2654 (merged)
- https://github.com/os-autoinst/os-autoinst/pull/2653 (merged)
- https://github.com/os-autoinst/os-autoinst/pull/2652 (merged)
- https://github.com/os-autoinst/os-autoinst/pull/2651 (merged)
- https://github.com/os-autoinst/os-autoinst/pull/2650 (merged)
most are handled in #176475
Updated by livdywan about 1 month ago
- Priority changed from Low to Normal
We can't lower priority so long as this blocks another ticket with higher priority. See #175060#note-38
Updated by okurz 30 days ago
- Assignee deleted (
okurz)
All prerequisities about better exception handling as noted in #176319-22 are done
We could progress with https://github.com/os-autoinst/os-autoinst/pull/2649 but the main point that we would like to solve to prevent some caveats is https://github.com/os-autoinst/os-autoinst/pull/2649#issuecomment-2671814488
The case when autotest takes long enough to terminate to still receive SIGTERM from isotovideo is not handled correctly. So far this change would lead to an incomplete test in this case (unless @okurz has already fixed this since our last session). Maybe autotest should simply avoid sending SIGTERM if it passed a power off command. Otherwise autotest could consider the SIGTERM "expected" in case it previously send a power off command.
Updated by openqa_review 23 days ago
- Due date set to 2025-03-25
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 20 days ago
- Due date deleted (
2025-03-25) - Status changed from In Progress to Resolved
All included with https://github.com/os-autoinst/os-autoinst/pull/2676 which was merged so we are good. Good job! This will be used automatically within openQA soon I guess. Then we can verify that the openQA test works stable
Updated by gpuliti 15 days ago
- Status changed from Resolved to Workable
While running stability test in https://github.com/os-autoinst/openQA/pull/6293 before apply the rollback, I've discover that ~45% of the fullstack tests results in failures (with 54 saples). You can see the result in the circleci pipeline here.
Updated by openqa_review 9 days ago
- Due date set to 2025-04-08
Setting due date based on mean cycle time of SUSE QE Tools
Updated by dheidler 8 days ago · Edited
- Status changed from In Progress to Feedback
Let's get better debug on fail as I can't reproduce the issue locally: https://github.com/os-autoinst/os-autoinst/pull/2686
Updated by okurz 8 days ago
- Blocks action #166445: [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry added
Updated by okurz 1 day ago
The CI job already failed today in the morning in https://app.circleci.com/pipelines/github/os-autoinst/openQA/16470/workflows/e3abc18d-d02a-4123-b96b-14ff526d86e8/jobs/159319 showing that the latest version of os-autoinst wasn't deployed. I wonder why you haven't seen that.
Updated by livdywan about 18 hours ago
- Status changed from Feedback to In Progress
Apparently sth about syswrite and this should be In Progress ;-)
Updated by livdywan about 18 hours ago
Apparently testing this is not easy. In the unlock we came up with a suggestion:
https://github.com/os-autoinst/os-autoinst/blob/master/.github/workflows/openqa_fullstack.yml#L33
Updated by livdywan about 17 hours ago
Updated by dheidler about 14 hours ago
- Status changed from In Progress to Feedback