Project

General

Profile

Actions

action #176319

open

coordination #176337: [saga][epic] Stable os-autoinst backends with stable command execution (no mistyping)

coordination #176340: [epic] Stable qemu backend with no unexpected mistyping

testapi power function call "off" needs to be handled gracefully by os-autoinst size:S

Added by okurz 2 months ago. Updated about 14 hours ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
2025-04-08 (Due in 5 days)
% Done:

0%

Estimated time:
Tags:

Description

Observation

See #175060. TinyCore which we use for testing within os-autoinst does not support handling the ACPI poweroff command so we tried power('off') which sends the QMP command "quit". That however tears down the complete qemu stack and then os-autoinst looking like this:

[2025-01-29T12:25:20.107196+01:00] [debug] [pid:32470] <<< testapi::power(action="off")
[2025-01-29T12:25:20.108408+01:00] [debug] [pid:32485] EVENT {"data":{"guest":false,"reason":"host-qmp-quit"},"event":"SHUTDOWN","timestamp":{"microseconds":108236,"seconds":1738149920}}
[2025-01-29T12:25:20.142808+01:00] [debug] [pid:32470] tests/shutdown.pm:10 called testapi::assert_shutdown
[2025-01-29T12:25:20.143067+01:00] [debug] [pid:32470] <<< testapi::check_shutdown(timeout=90)
[2025-01-29T12:25:20.146514+01:00] [info] [pid:32485] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  Can't syswrite(IO::Socket::UNIX=GLOB(0x55b38b999888), <BUFFER>): Broken pipe at backend/qemu.pm line 1130
…
[2025-01-29T12:25:22.249780+01:00] [warn] [pid:32460] !!! OpenQA::Isotovideo::Runner::_read_response: THERE IS NOTHING TO READ 18 4 3

Acceptance criteria

  • AC1: All supported testapi::power methods should be usable within os-autoinst test modules without causing isotovideo to crash when just checking for shutdown

Suggestions

  • Try to fix the usage of power('off'). If not possible as the 'off' actually sends "quit" over QMP and then tears down complete qemu consider to remove the implementation

Out of scope

Any other testapi method.


Related issues 3 (2 open1 closed)

Blocked by openQA Project (public) - action #176475: Use Feature::Compat::Try in our code - os-autoinst size:SResolvedokurz2025-02-03

Actions
Blocks openQA Tests (public) - action #166445: [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retryBlockedokurz2024-09-06

Actions
Copied from openQA Project (public) - action #175060: [sporadic] [Workflow] Failed: os-autoinst/openQA on master / test (7dc9d82) size:MBlockedgpuliti

Actions
Actions #1

Updated by okurz 2 months ago

  • Copied from action #175060: [sporadic] [Workflow] Failed: os-autoinst/openQA on master / test (7dc9d82) size:M added
Actions #2

Updated by okurz 2 months ago

  • Subject changed from testapi power function seems to not work as expected to testapi power function call "off" needs to be handled gracefully by os-autoinst
  • Description updated (diff)
  • Target version changed from Ready to Tools - Next
Actions #3

Updated by okurz 2 months ago

  • Target version changed from Tools - Next to future
Actions #4

Updated by okurz 2 months ago

  • Target version changed from future to Ready
  • Parent task set to #176340
Actions #5

Updated by tinita 2 months ago

  • Subject changed from testapi power function call "off" needs to be handled gracefully by os-autoinst to testapi power function call "off" needs to be handled gracefully by os-autoinst size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by okurz 2 months ago

  • Category changed from Regressions/Crashes to Feature requests
Actions #7

Updated by ybonatakis 2 months ago

  • Assignee set to ybonatakis
Actions #8

Updated by ybonatakis about 2 months ago

  • Description updated (diff)
Actions #9

Updated by ybonatakis about 2 months ago

I took a look today. I got lost but here where I endup:
Run an instance:

qemu-system-x86_64 \                                                                                                                                                                                                                               
  -enable-kvm \
  -cdrom t/data/Core-7.2.iso \       
  -m 1024 \
  -cpu host \
  -smp 2 \
  -serial stdio \
  -qmp tcp:0:4444,server,nowait

in another console I get qmp and run the following(included the output:

nc localhost 4444                                                                                                                                                                                                                        SIGINT(2) ↵  451  17:37:58   
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 9}, "package": "openSUSE Tumbleweed"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute": "query-status"}
{"return": {"status": "running", "running": true}}
{ "execute": "send-key",
     "arguments": { "keys": [ { "type": "qcode", "data": "ctrl" },
                              { "type": "qcode", "data": "alt" },
                              { "type": "qcode", "data": "delete" } ] } }
{"return": {}}
{"timestamp": {"seconds": 1738601139, "microseconds": 880024}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1738601139, "microseconds": 885425}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
#VM gets boot console
{"execute": "query-status"}
{"return": {"status": "running", "running": true}}
# and after a while boots again

Then I found I can get the available commands[0].

{ "execute": "query-commands" }
{"return": [{"name": "device_add"}, {"name": "cxl-release-dynamic-capacity"}, {"name": "cxl-add-dynamic-capacity"}, {"name": "cxl-inject-correctable-error"}, {"name": "cxl-inject-uncorrectable-errors"}, {"name": "cxl-inject-poison"}, {"name": "cxl-inject-memory-module-event"}, {"name": "cxl-inject-dram-event"}, {"name": "cxl-inject-general-media-event"}, {"name": "query-cryptodev"}, {"name": "x-query-virtio-queue-element"}, {"name": "x-query-virtio-vhost-queue-status"}, {"name": "x-query-virtio-queue-status"}, {"name": "x-query-virtio-status"}, {"name": "x-query-virtio"}, {"name": "query-stats-schemas"}, {"name": "query-stats"}, {"name": "query-pci"}, {"name": "query-acpi-ospm-status"}, {"name": "query-audiodevs"}, {"name": "xen-event-inject"}, {"name": "xen-event-list"}, {"name": "query-sgx-capabilities"}, {"name": "query-sgx"}, {"name": "query-sev-attestation-report"}, {"name": "sev-inject-launch-secret"}, {"name": "query-sev-capabilities"}, {"name": "query-sev-launch-measure"}, {"name": "query-sev"}, {"name": "rtc-reset-reinjection"}, {"name": "query-command-line-options"}, {"name": "query-fdsets"}, {"name": "remove-fd"}, {"name": "add-fd"}, {"name": "closefd"}, {"name": "getfd"}, {"name": "human-monitor-command"}, {"name": "x-exit-preconfig"}, {"name": "cont"}, {"name": "stop"}, {"name": "query-iothreads"}, {"name": "query-name"}, {"name": "add_client"}, {"name": "query-yank"}, {"name": "yank"}, {"name": "replay-seek"}, {"name": "replay-delete-break"}, {"name": "replay-break"}, {"name": "query-replay"}, {"name": "query-cpu-definitions"}, {"name": "query-cpu-model-expansion"}, {"name": "x-query-interrupt-controllers"}, {"name": "dumpdtb"}, {"name": "x-query-usb"}, {"name": "x-query-roms"}, {"name": "x-query-ramblock"}, {"name": "x-query-opcount"}, {"name": "x-query-numa"}, {"name": "x-query-jit"}, {"name": "x-query-irq"}, {"name": "query-memory-devices"}, {"name": "query-memory-size-summary"}, {"name": "query-hv-balloon-status-report"}, {"name": "query-balloon"}, {"name": "balloon"}, {"name": "set-numa-node"}, {"name": "query-hotpluggable-cpus"}, {"name": "query-memdev"}, {"name": "pmemsave"}, {"name": "memsave"}, {"name": "query-kvm"}, {"name": "inject-nmi"}, {"name": "system_wakeup"}, {"name": "system_powerdown"}, {"name": "system_reset"}, {"name": "query-vm-generation-id"}, {"name": "query-uuid"}, {"name": "query-target"}, {"name": "query-current-machine"}, {"name": "query-machines"}, {"name": "query-cpus-fast"}, {"name": "device-sync-config"}, {"name": "device_del"}, {"name": "device-list-properties"}, {"name": "object-del"}, {"name": "object-add"}, {"name": "qom-list-properties"}, {"name": "qom-list-types"}, {"name": "qom-set"}, {"name": "qom-get"}, {"name": "qom-list"}, {"name": "query-qmp-schema"}, {"name": "quit"}, {"name": "query-commands"}, {"name": "query-version"}, {"name": "qmp_capabilities"}, {"name": "trace-event-set-state"}, {"name": "trace-event-get-state"}, {"name": "transaction"}, {"name": "snapshot-delete"}, {"name": "snapshot-load"}, {"name": "snapshot-save"}, {"name": "query-migrationthreads"}, {"name": "query-vcpu-dirty-limit"}, {"name": "cancel-vcpu-dirty-limit"}, {"name": "set-vcpu-dirty-limit"}, {"name": "query-dirty-rate"}, {"name": "calc-dirty-rate"}, {"name": "migrate-pause"}, {"name": "migrate-recover"}, {"name": "query-colo-status"}, {"name": "xen-colo-do-checkpoint"}, {"name": "query-xen-replication-status"}, {"name": "xen-set-replication"}, {"name": "xen-load-devices-state"}, {"name": "xen-set-global-dirty-log"}, {"name": "xen-save-devices-state"}, {"name": "migrate-incoming"}, {"name": "migrate"}, {"name": "migrate-continue"}, {"name": "migrate_cancel"}, {"name": "x-colo-lost-heartbeat"}, {"name": "migrate-start-postcopy"}, {"name": "query-migrate-parameters"}, {"name": "migrate-set-parameters"}, {"name": "query-migrate-capabilities"}, {"name": "migrate-set-capabilities"}, {"name": "query-migrate"}, {"name": "client_migrate_info"}, {"name": "display-update"}, {"name": "display-reload"}, {"name": "query-display-options"}, {"name": "input-send-event"}, {"name": "send-key"}, {"name": "query-mice"}, {"name": "change-vnc-password"}, {"name": "query-vnc-servers"}, {"name": "query-vnc"}, {"name": "query-spice"}, {"name": "screendump"}, {"name": "expire_password"}, {"name": "set_password"}, {"name": "query-tpm"}, {"name": "query-tpm-types"}, {"name": "query-tpm-models"}, {"name": "query-rocker-of-dpa-groups"}, {"name": "query-rocker-of-dpa-flows"}, {"name": "query-rocker-ports"}, {"name": "query-rocker"}, {"name": "request-ebpf"}, {"name": "announce-self"}, {"name": "query-rx-filter"}, {"name": "netdev_del"}, {"name": "netdev_add"}, {"name": "set_link"}, {"name": "query-dump-guest-memory-capability"}, {"name": "query-dump"}, {"name": "dump-guest-memory"}, {"name": "chardev-send-break"}, {"name": "chardev-remove"}, {"name": "chardev-change"}, {"name": "chardev-add"}, {"name": "ringbuf-read"}, {"name": "ringbuf-write"}, {"name": "query-chardev-backends"}, {"name": "query-chardev"}, {"name": "query-block-exports"}, {"name": "block-export-del"}, {"name": "block-export-add"}, {"name": "nbd-server-stop"}, {"name": "nbd-server-remove"}, {"name": "nbd-server-add"}, {"name": "nbd-server-start"}, {"name": "blockdev-snapshot-delete-internal-sync"}, {"name": "blockdev-snapshot-internal-sync"}, {"name": "x-blockdev-set-iothread"}, {"name": "x-blockdev-change"}, {"name": "block-set-write-threshold"}, {"name": "x-blockdev-amend"}, {"name": "blockdev-create"}, {"name": "blockdev-del"}, {"name": "blockdev-reopen"}, {"name": "blockdev-add"}, {"name": "block-job-change"}, {"name": "block-job-finalize"}, {"name": "block-job-dismiss"}, {"name": "block-job-complete"}, {"name": "block-job-resume"}, {"name": "block-job-pause"}, {"name": "block-job-cancel"}, {"name": "block-job-set-speed"}, {"name": "block-stream"}, {"name": "blockdev-mirror"}, {"name": "x-debug-block-dirty-bitmap-sha256"}, {"name": "block-dirty-bitmap-merge"}, {"name": "block-dirty-bitmap-disable"}, {"name": "block-dirty-bitmap-enable"}, {"name": "block-dirty-bitmap-clear"}, {"name": "block-dirty-bitmap-remove"}, {"name": "block-dirty-bitmap-add"}, {"name": "drive-mirror"}, {"name": "x-debug-query-block-graph"}, {"name": "query-named-block-nodes"}, {"name": "blockdev-backup"}, {"name": "drive-backup"}, {"name": "block-commit"}, {"name": "change-backing-file"}, {"name": "blockdev-snapshot"}, {"name": "blockdev-snapshot-sync"}, {"name": "block_resize"}, {"name": "query-block-jobs"}, {"name": "query-blockstats"}, {"name": "query-block"}, {"name": "block-latency-histogram-set"}, {"name": "block_set_io_throttle"}, {"name": "blockdev-change-medium"}, {"name": "blockdev-insert-medium"}, {"name": "blockdev-remove-medium"}, {"name": "blockdev-close-tray"}, {"name": "blockdev-open-tray"}, {"name": "eject"}, {"name": "query-pr-managers"}, {"name": "query-jobs"}, {"name": "job-finalize"}, {"name": "job-dismiss"}, {"name": "job-complete"}, {"name": "job-cancel"}, {"name": "job-resume"}, {"name": "job-pause"}, {"name": "set-action"}, {"name": "watchdog-set-action"}, {"name": "query-status"}]}

If you grep the list I see only "system_powerdown", so i run that

{ "execute": "system_powerdown" }
{"timestamp": {"seconds": 1738601890, "microseconds": 906745}, "event": "POWERDOWN"}
{"return": {}}

Nothing seems to be happening...

[0] https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1673

Actions #10

Updated by okurz about 2 months ago

  • Status changed from Workable to In Progress
Actions #11

Updated by ybonatakis about 2 months ago

no progress so far. I didnt spend much time today with it. I tried to run the same flow as yesterday with a tumbleweed iso. I didnt see power do what it was expecting. no sure if I did something wrong. I wonder if I should copy the qemu cmd from the CI.

During those experiments I tried to send input-send-event in various ways. I havent still made it work. I send it with missing parameters or something.

So the question I have now is whether consider to remove the implementation means delete the power sub!!

Actions #12

Updated by okurz about 2 months ago

ybonatakis wrote in #note-11:

[…]
So the question I have now is whether consider to remove the implementation means delete the power sub!!

no, just the method. But I am not yet convinced that we can't get it to work

Actions #13

Updated by openqa_review about 2 months ago

  • Due date set to 2025-02-19

Setting due date based on mean cycle time of SUSE QE Tools

Actions #14

Updated by okurz about 2 months ago

As discussed in the unblock use power('off') in t/data/tests/tests/shutdown.pm instead of sudo poweroff and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.

Actions #16

Updated by ybonatakis about 2 months ago · Edited

okurz wrote in #note-14:

As discussed in the unblock use power('off') in t/data/tests/tests/shutdown.pm instead of sudo poweroff and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.

I need to "study" the pipe's handling.

In the meantime I created a PR to reproduce the error as it was not visible locally. but it seems that it came up -> https://github.com/osh-autoinst/os-autoinst/pull/2646

However, back on my WS I can trigger a restart with the following QMP

{ "execute": "input-send-event",
      "arguments": { "events": [
         { "type": "key", "data" : { "down": true,
           "key": {"type": "qcode", "data": "ctrl" } } },
         { "type": "key", "data" : { "down": true,
           "key": {"type": "qcode", "data": "alt" } } },
         { "type": "key", "data" : { "down": true,
           "key": {"type": "qcode", "data": "delete" } } } ] } }
{"return": {}}

This is given exactly as it found in the docs but I think you guys tried this before. would this work in place of off?
after the system shuts down the boot reappears and then boots again tho.

Actions #17

Updated by ybonatakis about 2 months ago

I tried also this which doesnt work

{                                                                                                                                                                                                                                                                             
  "execute": "input-send-event",                                                                                                                                                                                                                                              
  "arguments": {                                                                                                                                                                                                                                                              
    "events": [                                                                                                                                                                                                                                                               
      {                                                                                                                                                                                                                                                                       
        "type": "key",                                                                                                                                                                                                                                                        
        "data": {                                                                                                                                                                                                                                                             
          "key": {"type": "qcode", "data": "power" },                                                                                                                                                                                                                          
          "down": true                                                                                                                                                                                                                                                        
        }                                                                                                                                                                                                                                                                     
      },                                                                                                                                                                                                                                                                      
      {                                                                                                                                                                                                                                                                       
        "type": "key",                                                                                                                                                                                                                                                        
        "data": {                                                                                                                                                                                                                                                             
          "key": {"type": "qcode", "data": "power" },                                                                                                                                                                                                                          
          "down": false } } ]}}
{"return": {}}
Actions #18

Updated by ybonatakis about 2 months ago

ybonatakis wrote in #note-16:

okurz wrote in #note-14:

As discussed in the unblock use power('off') in t/data/tests/tests/shutdown.pm instead of sudo poweroff and run t/99-full-stack.t to reproduce the log content shown in the ticket description. Then follow the suggestion to handle the non-existant pipe gracefully.

I need to "study" the pipe's handling.

In the meantime I created a PR to reproduce the error as it was not visible locally. but it seems that it came up -> https://github.com/osh-autoinst/os-autoinst/pull/2646

However, back on my WS I can trigger a restart with the following QMP

{ "execute": "input-send-event",
      "arguments": { "events": [
         { "type": "key", "data" : { "down": true,
           "key": {"type": "qcode", "data": "ctrl" } } },
         { "type": "key", "data" : { "down": true,
           "key": {"type": "qcode", "data": "alt" } } },
         { "type": "key", "data" : { "down": true,
           "key": {"type": "qcode", "data": "delete" } } } ] } }
{"return": {}}

I applied this in the power{'off} and ends with https://github.com/os-autoinst/os-autoinst/actions/runs/13164189124/job/36740193329?pr=2646

This is given exactly as it found in the docs but I think you guys tried this before. would this work in place of off?
after the system shuts down the boot reappears and then boots again tho.

Actions #20

Updated by ybonatakis about 2 months ago

  • Assignee changed from ybonatakis to okurz

okurz wrote in #note-19:

My approach: https://github.com/os-autoinst/os-autoinst/pull/2649

Thanks for taking care of it. I cant really reiew as I dont fully understand the code. I think it was something I couldnt figure it out by myself. I assigned the ticket to you

Actions #21

Updated by okurz about 2 months ago

  • Blocked by action #176475: Use Feature::Compat::Try in our code - os-autoinst size:S added
Actions #22

Updated by okurz about 2 months ago · Edited

  • Status changed from In Progress to Blocked

Based on https://github.com/os-autoinst/os-autoinst/pull/2649#pullrequestreview-2601866232 I am preparing some changes to bring in Feature::Compat::Try and get rid of some older suboptimal code. I plan to continue here afterwards so that we have a good style for catching exception going forward before I introduce more as necessary for this PR.

Next PRs:

most are handled in #176475

Actions #23

Updated by okurz about 2 months ago

  • Due date deleted (2025-02-19)
Actions #24

Updated by okurz about 2 months ago

  • Status changed from Blocked to Workable
Actions #25

Updated by okurz about 1 month ago

  • Priority changed from Normal to Low
Actions #26

Updated by livdywan about 1 month ago

  • Priority changed from Low to Normal

We can't lower priority so long as this blocks another ticket with higher priority. See #175060#note-38

Actions #27

Updated by okurz 30 days ago

  • Assignee deleted (okurz)

All prerequisities about better exception handling as noted in #176319-22 are done

We could progress with https://github.com/os-autoinst/os-autoinst/pull/2649 but the main point that we would like to solve to prevent some caveats is https://github.com/os-autoinst/os-autoinst/pull/2649#issuecomment-2671814488

The case when autotest takes long enough to terminate to still receive SIGTERM from isotovideo is not handled correctly. So far this change would lead to an incomplete test in this case (unless @okurz has already fixed this since our last session). Maybe autotest should simply avoid sending SIGTERM if it passed a power off command. Otherwise autotest could consider the SIGTERM "expected" in case it previously send a power off command.

Actions #28

Updated by dheidler 29 days ago

  • Assignee set to dheidler
Actions #29

Updated by dheidler 24 days ago

  • Status changed from Workable to In Progress
Actions #30

Updated by openqa_review 23 days ago

  • Due date set to 2025-03-25

Setting due date based on mean cycle time of SUSE QE Tools

Actions #31

Updated by okurz 20 days ago

  • Due date deleted (2025-03-25)
  • Status changed from In Progress to Resolved

All included with https://github.com/os-autoinst/os-autoinst/pull/2676 which was merged so we are good. Good job! This will be used automatically within openQA soon I guess. Then we can verify that the openQA test works stable

Actions #32

Updated by gpuliti 15 days ago

  • Status changed from Resolved to Workable

While running stability test in https://github.com/os-autoinst/openQA/pull/6293 before apply the rollback, I've discover that ~45% of the fullstack tests results in failures (with 54 saples). You can see the result in the circleci pipeline here.

Actions #33

Updated by dheidler 10 days ago

  • Status changed from Workable to In Progress
Actions #34

Updated by openqa_review 9 days ago

  • Due date set to 2025-04-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions #35

Updated by dheidler 8 days ago · Edited

  • Status changed from In Progress to Feedback

Let's get better debug on fail as I can't reproduce the issue locally: https://github.com/os-autoinst/os-autoinst/pull/2686

Actions #36

Updated by okurz 8 days ago

  • Blocks action #166445: [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry added
Actions #37

Updated by livdywan 3 days ago

  • Status changed from Feedback to Workable

dheidler wrote in #note-35:

Let's get better debug on fail as I can't reproduce the issue locally: https://github.com/os-autoinst/os-autoinst/pull/2686

Merged.

Actions #38

Updated by dheidler 2 days ago

Now let's see if @gpuliti can provide a new CI failure wich should have better debug output now.

Actions #40

Updated by gpuliti 1 day ago

  • Status changed from Workable to In Progress
Actions #41

Updated by gpuliti 1 day ago

  • Status changed from In Progress to Feedback
Actions #42

Updated by okurz 1 day ago

The CI job already failed today in the morning in https://app.circleci.com/pipelines/github/os-autoinst/openQA/16470/workflows/e3abc18d-d02a-4123-b96b-14ff526d86e8/jobs/159319 showing that the latest version of os-autoinst wasn't deployed. I wonder why you haven't seen that.

Actions #43

Updated by livdywan about 18 hours ago

  • Status changed from Feedback to In Progress

Apparently sth about syswrite and this should be In Progress ;-)

Actions #44

Updated by livdywan about 18 hours ago

Apparently testing this is not easy. In the unlock we came up with a suggestion:

https://github.com/os-autoinst/os-autoinst/blob/master/.github/workflows/openqa_fullstack.yml#L33

Actions #46

Updated by dheidler about 14 hours ago

  • Status changed from In Progress to Feedback
Actions

Also available in: Atom PDF