Project

General

Profile

Actions

action #122773

closed

[qe-core][leap 15.4 update]test fails in cifs,failed to rollback to the snapshot

Added by rfan1 over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2023-01-06
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-15.4-DVD-Updates-x86_64-extra_tests_filesystem@64bit fails in
cifs

Test suite description

Maintainer: QE Core

Filesystem related tests, for example snapper and btrfs features.

Reproducible

Fails since (at least) Build 20230106-1 (current job)

Expected result

Last good: 20230105-3 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 1 (1 open0 closed)

Related to openQA Project - action #109929: Snapshot rollback after SUT reboot breaks console switchingNew2022-04-13

Actions
Actions #1

Updated by rfan1 over 1 year ago

the milestone case is https://openqa.opensuse.org/tests/3016756/modules/consoletest_setup/steps/1/src

Seems it has already login to the system and leads the error?

Actions #2

Updated by ph03nix over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to ph03nix
Actions #4

Updated by ph03nix over 1 year ago

  • % Done changed from 0 to 80

PR merged, let's keep this open for 1-2 days to check if this has resolved the issues on O3.

Actions #5

Updated by ph03nix over 1 year ago

  • % Done changed from 80 to 100
Actions #6

Updated by rfan1 over 1 year ago

Actions #7

Updated by ph03nix over 1 year ago

Thanks for the report, will look into this again.

Actions #8

Updated by ph03nix over 1 year ago

Hmmm, what could trigger this is that the consoles between the last snapshot and the current test run differs. The test fails at select_serial_terminal(): https://openqa.opensuse.org/tests/3030786#step/cifs/3

# Test died: Failed to wait for login prompt at opensuse/lib/serial_terminal.pm line 137.
    serial_terminal::login("root", "# ") called at opensuse/lib/susedistribution.pm line 838
    susedistribution::activate_console(Distribution::Opensuse::Leap::15=HASH(0x5620ad009690), "root-virtio-terminal") called at /usr/lib/os-autoinst/testapi.pm line 1703
    testapi::select_console("root-virtio-terminal") called at opensuse/lib/serial_terminal.pm line 357
    serial_terminal::select_serial_terminal() called at opensuse/tests/network/cifs.pm line 47

The snapshot was taken while the test was in the root-console: https://openqa.opensuse.org/tests/3030786#step/consoletest_setup/67

Looking into this.

Actions #9

Updated by ph03nix over 1 year ago

Gosh, it's really sporadic: https://openqa.opensuse.org/tests/3032189

Actions #10

Updated by ph03nix over 1 year ago

Resore procedure from the failing test run https://openqa.opensuse.org/tests/3030786/logfile?filename=autoinst-log.txt:

[2023-01-11T03:09:07.641606+01:00] [debug] [pid:18349] ||| finished zfs console (runtime: 177 s)
[2023-01-11T03:09:07.644625+01:00] [debug] [pid:18349] Loading a VM snapshot lastgood
[2023-01-11T03:09:07.645405+01:00] [debug] [pid:18411] EVENT {"data":{"offset":0},"event":"RTC_CHANGE","timestamp":{"microseconds":500148,"seconds":1673402415}}
[2023-01-11T03:09:07.645583+01:00] [debug] [pid:18411] EVENT {"data":{"guest":true,"reason":"guest-reset"},"event":"RESET","timestamp":{"microseconds":984310,"seconds":1673402821}}
[2023-01-11T03:09:07.645732+01:00] [debug] [pid:18411] EVENT {"data":{"guest":true,"reason":"guest-reset"},"event":"RESET","timestamp":{"microseconds":1679,"seconds":1673402822}}
[2023-01-11T03:09:07.645829+01:00] [debug] [pid:18411] EVENT {"data":{"offset":0},"event":"RTC_CHANGE","timestamp":{"microseconds":660517,"seconds":1673402859}}
[2023-01-11T03:09:07.645935+01:00] [debug] [pid:18411] Loading snapshot (Current VM state is running)
[2023-01-11T03:09:07.665855+01:00] [debug] [pid:18411] EVENT {"event":"STOP","timestamp":{"microseconds":665540,"seconds":1673402947}}
[2023-01-11T03:09:07.668188+01:00] [info] [pid:18411] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2023-01-11T03:09:08.762849+01:00] [debug] [pid:18411] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 18411 (/usr/bin/isotovideo: backen)
[2023-01-11T03:09:09.403896+01:00] [debug] [pid:18411] running `/usr/bin/qemu-img create -f qcow2 -F qcow2 -b /var/lib/openqa/pool/9/raid/hd0-overlay0 /var/lib/openqa/pool/9/raid/hd0-overlay1 32212254720`
[2023-01-11T03:09:09.489096+01:00] [debug] [pid:18411] Formatting '/var/lib/openqa/pool/9/raid/hd0-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=32212254720 backing_file=/var/lib/openqa/pool/9/raid/hd0-overlay0 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
[2023-01-11T03:09:09.489330+01:00] [debug] [pid:18411] running `/usr/bin/qemu-img create -f qcow2 -F qcow2 -b /var/lib/openqa/pool/9/raid/hd1 /var/lib/openqa/pool/9/raid/hd1-overlay1 20G`
[2023-01-11T03:09:09.556016+01:00] [debug] [pid:18411] Formatting '/var/lib/openqa/pool/9/raid/hd1-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=21474836480 backing_file=/var/lib/openqa/pool/9/raid/hd1 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
[2023-01-11T03:09:09.556181+01:00] [debug] [pid:18411] running `/usr/bin/qemu-img create -f qcow2 -F qcow2 -b /var/lib/openqa/pool/9/raid/cd0-overlay0 /var/lib/openqa/pool/9/raid/cd0-overlay1 4083154944`
[2023-01-11T03:09:09.620670+01:00] [debug] [pid:18411] Formatting '/var/lib/openqa/pool/9/raid/cd0-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=4083154944 backing_file=/var/lib/openqa/pool/9/raid/cd0-overlay0 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
[2023-01-11T03:09:09.620790+01:00] [debug] [pid:18411] init_blockdev_images: Finished creating block devices
[2023-01-11T03:09:09.621939+01:00] [debug] [pid:18411] starting: /usr/bin/qemu-system-x86_64 -device VGA,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 1536 -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot order=c -device qemu-xhci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :99,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay1-file,filename=/var/lib/openqa/pool/9/raid/hd0-overlay1,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay1,file=hd0-overlay1-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd0-device,drive=hd0-overlay1,bootindex=0,serial=hd0 -blockdev driver=file,node-name=hd1-overlay1-file,filename=/var/lib/openqa/pool/9/raid/hd1-overlay1,cache.no-flush=on -blockdev driver=qcow2,node-name=hd1-overlay1,file=hd1-overlay1-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd1-device,drive=hd1-overlay1,serial=hd1 -blockdev driver=file,node-name=cd0-overlay1-file,filename=/var/lib/openqa/pool/9/raid/cd0-overlay1,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay1,file=cd0-overlay1-file,cache.no-flush=on,discard=unmap -device scsi-cd,id=cd0-device,drive=cd0-overlay1,serial=cd0 -incoming defer
[2023-01-11T03:09:09.638081+01:00] [debug] [pid:18411] Waiting for 0 attempts
[2023-01-11T03:09:10.638747+01:00] [debug] [pid:18411] Waiting for 1 attempts
[2023-01-11T03:09:11.639230+01:00] [debug] [pid:18411] Finished after 2 attempts
[2023-01-11T03:09:11.641772+01:00] [debug] [pid:18411] EVENT {"data":{"status":"setup"},"event":"MIGRATION","timestamp":{"microseconds":641592,"seconds":1673402951}}
[2023-01-11T03:09:12.644735+01:00] [debug] [pid:18411] EVENT {"data":{"status":"active"},"event":"MIGRATION","timestamp":{"microseconds":647493,"seconds":1673402951}}
[2023-01-11T03:09:14.674697+01:00] [debug] [pid:18411] EVENT {"data":{"status":"completed"},"event":"MIGRATION","timestamp":{"microseconds":126028,"seconds":1673402954}}
[2023-01-11T03:09:14.675172+01:00] [debug] [pid:18411] <<< consoles::virtio_terminal::open_pipe(pipe_prefix="/var/lib/openqa/pool/9/virtio_console")
[2023-01-11T03:09:14.675943+01:00] [info] [pid:18411] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2023-01-11T03:09:14.676155+01:00] [info] [pid:18411] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2023-01-11T03:09:14.676522+01:00] [debug] [pid:18411] Establishing VNC connection to localhost:5999
[2023-01-11T03:09:14.684079+01:00] [debug] [pid:18411] pointer type 1 0 1024 768 -257
[2023-01-11T03:09:14.684207+01:00] [debug] [pid:18411] led state 0 1 1 -261
[2023-01-11T03:09:14.743431+01:00] [debug] [pid:18411] Restored snapshot
[2023-01-11T03:09:14.743780+01:00] [debug] [pid:18411] EVENT {"data":{"client":{"family":"ipv4","host":"127.0.0.1","service":"34386","websocket":false},"server":{"auth":"none","family":"ipv4","host":"0.0.0.0","service":"5999","websocket":false}},"event":"VNC_CONNECTED","timestamp":{"microseconds":677352,"seconds":1673402954}}
[2023-01-11T03:09:14.743903+01:00] [debug] [pid:18411] EVENT {"data":{"client":{"family":"ipv4","host":"127.0.0.1","service":"34386","websocket":false},"server":{"auth":"none","family":"ipv4","host":"0.0.0.0","service":"5999","websocket":false}},"event":"VNC_INITIALIZED","timestamp":{"microseconds":681922,"seconds":1673402954}}
[2023-01-11T03:09:14.744078+01:00] [debug] [pid:18411] EVENT {"event":"RESUME","timestamp":{"microseconds":743960,"seconds":1673402954}}
[2023-01-11T03:09:14.802705+01:00] [debug] [pid:18411] QEMU: QEMU emulator version 6.2.0 (SUSE Linux Enterprise 15)
[2023-01-11T03:09:14.802774+01:00] [debug] [pid:18411] QEMU: Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
[2023-01-11T03:09:15.221859+01:00] [debug] [pid:18349] ||| starting cifs tests/network/cifs.pm
Actions #12

Updated by ph03nix over 1 year ago

Yes! Could reproduce the issue one time: https://duck-norris.qe.suse.de/tests/11823#step/cifs/3

Actions #13

Updated by ph03nix over 1 year ago

I see almost all test runs failing. This is a good starting point to test my hypothesis, that the issue arises from using a different console in the snapshot point and in the restore point.

Actions #14

Updated by ph03nix over 1 year ago

The console was not the issue, the test run is still failing: https://duck-norris.qe.suse.de/tests/11878#

Actions #15

Updated by ph03nix over 1 year ago

However it's not failing, if the root-console was selected instead of the serial terminal: https://duck-norris.qe.suse.de/tests/11875#

Actions #16

Updated by ph03nix over 1 year ago

Nope, also switching to root-console before the snapshot rollback is not really helping: https://duck-norris.qe.suse.de/tests/11879#step/cifs/2

Actions #17

Updated by ph03nix over 1 year ago

Ok, I'm out of luck with the easy stuff. Now going to advance to do some more dirty work here. First attempt: Increase all relevant timeouts and check if this is just a timeout issue.

This might get ugly though.

Actions #18

Updated by ph03nix over 1 year ago

Observation: After restoring to the last snapshot, the screen in the openQA web UI remains completely black. The live log states

[2023-01-12T16:35:36.730226+01:00] [debug] [pid:21011] <<< serial_terminal::login()
[2023-01-12T16:35:36.730336+01:00] [debug] [pid:21011] tests/network/cifs.pm:48 called serial_terminal::select_serial_terminal -> lib/serial_terminal.pm:357 called testapi::select_console -> lib/susedistribution.pm:838 called serial_terminal::login -> lib/serial_terminal.pm:133 called testapi::wait_serial
[2023-01-12T16:35:36.730497+01:00] [debug] [pid:21011] <<< testapi::wait_serial(expect_not_found=0, buffer_size=undef, regexp=qr/login:\s*$/ui, quiet=1, no_regex=0, timeout=3, record_output=undef)
[2023-01-12T16:35:36.731168+01:00] [debug] [pid:21018] <<< consoles::serial_screen::read_until(expect_not_found=0, regexp="(?^ui:login:\\s*\$)", buffer_size=undef, quiet=1, record_output=undef, pattern="(?^ui:login:\\s*\$)", cmd="backend_wait_serial", no_regex=0, json_cmd_token="wrgzCHmF", timeout=3)
[2023-01-12T16:35:39.734807+01:00] [debug] [pid:21011] >>> testapi::wait_serial: (?^ui:login:\s*$): fail
[2023-01-12T16:35:39.735014+01:00] [debug] [pid:21011] tests/network/cifs.pm:48 called serial_terminal::select_serial_terminal -> lib/serial_terminal.pm:357 called testapi::select_console -> lib/susedistribution.pm:838 called serial_terminal::login -> lib/serial_terminal.pm:136 called testapi::send_key
[2023-01-12T16:35:39.735157+01:00] [debug] [pid:21011] <<< testapi::send_key(key="ret", wait_screen_change=0, do_wait=0)
[2023-01-12T16:35:39.736120+01:00] [debug] [pid:21018] <<< consoles::serial_screen::type_string(key="ret", json_cmd_token="VshvoDID", text="\n", cmd="backend_send_key")
[2023-01-12T16:35:39.736704+01:00] [debug] [pid:21011] tests/network/cifs.pm:48 called serial_terminal::select_serial_terminal -> lib/serial_terminal.pm:357 called testapi::select_console -> lib/susedistribution.pm:838 called serial_terminal::login -> lib/serial_terminal.pm:137 called testapi::wait_serial
[2023-01-12T16:35:39.736908+01:00] [debug] [pid:21011] <<< testapi::wait_serial(timeout=300, record_output=undef, no_regex=0, quiet=undef, expect_not_found=0, buffer_size=undef, regexp=qr/login:\s*$/ui)
[2023-01-12T16:35:39.737822+01:00] [debug] [pid:21018] <<< consoles::serial_screen::read_until(expect_not_found=0, regexp="(?^ui:login:\\s*\$)", buffer_size=undef, record_output=undef, quiet=undef, cmd="backend_wait_serial", pattern="(?^ui:login:\\s*\$)", timeout=300, json_cmd_token="qLuwvvpn", no_regex=0)

Looks like it's somehow jammed.

Actions #19

Updated by ph03nix over 1 year ago

So it's likely also not a timeout issue: https://duck-norris.qe.suse.de/tests/11881#step/cifs/3 (failure after increasing the timeout from 90 to 300 seconds).

Actions #20

Updated by ph03nix over 1 year ago

  • Related to action #109929: Snapshot rollback after SUT reboot breaks console switching added
Actions #21

Updated by ph03nix over 1 year ago

Actions #22

Updated by ph03nix over 1 year ago

This means we can move the zfs test to the back of the test run, where the fallout should not hurt anyone else.

Actions #25

Updated by ph03nix over 1 year ago

I keep the ticket open for a bit to monitor the situation and see, if the last changes in the schedule helped.

Actions #26

Updated by ph03nix over 1 year ago

  • Status changed from In Progress to Resolved
  • % Done changed from 70 to 100
Actions

Also available in: Atom PDF