action #122773
closed[qe-core][leap 15.4 update]test fails in cifs,failed to rollback to the snapshot
100%
Description
Observation¶
openQA test in scenario opensuse-15.4-DVD-Updates-x86_64-extra_tests_filesystem@64bit fails in
cifs
Test suite description¶
Maintainer: QE Core
Filesystem related tests, for example snapper and btrfs features.
Reproducible¶
Fails since (at least) Build 20230106-1 (current job)
Expected result¶
Last good: 20230105-3 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by rfan1 almost 2 years ago
the milestone case is https://openqa.opensuse.org/tests/3016756/modules/consoletest_setup/steps/1/src
Seems it has already login to the system and leads the error?
Updated by ph03nix almost 2 years ago
- Status changed from New to In Progress
- Assignee set to ph03nix
Updated by ph03nix almost 2 years ago
Updated by ph03nix almost 2 years ago
- % Done changed from 0 to 80
PR merged, let's keep this open for 1-2 days to check if this has resolved the issues on O3.
Updated by rfan1 almost 2 years ago
The test failed again:
https://openqa.opensuse.org/tests/3030786
Updated by ph03nix almost 2 years ago
Thanks for the report, will look into this again.
Updated by ph03nix almost 2 years ago
Hmmm, what could trigger this is that the consoles between the last snapshot and the current test run differs. The test fails at select_serial_terminal()
: https://openqa.opensuse.org/tests/3030786#step/cifs/3
# Test died: Failed to wait for login prompt at opensuse/lib/serial_terminal.pm line 137.
serial_terminal::login("root", "# ") called at opensuse/lib/susedistribution.pm line 838
susedistribution::activate_console(Distribution::Opensuse::Leap::15=HASH(0x5620ad009690), "root-virtio-terminal") called at /usr/lib/os-autoinst/testapi.pm line 1703
testapi::select_console("root-virtio-terminal") called at opensuse/lib/serial_terminal.pm line 357
serial_terminal::select_serial_terminal() called at opensuse/tests/network/cifs.pm line 47
The snapshot was taken while the test was in the root-console
: https://openqa.opensuse.org/tests/3030786#step/consoletest_setup/67
Looking into this.
Updated by ph03nix almost 2 years ago
Gosh, it's really sporadic: https://openqa.opensuse.org/tests/3032189
Updated by ph03nix almost 2 years ago
Resore procedure from the failing test run https://openqa.opensuse.org/tests/3030786/logfile?filename=autoinst-log.txt:
[2023-01-11T03:09:07.641606+01:00] [debug] [pid:18349] ||| finished zfs console (runtime: 177 s)
[2023-01-11T03:09:07.644625+01:00] [debug] [pid:18349] Loading a VM snapshot lastgood
[2023-01-11T03:09:07.645405+01:00] [debug] [pid:18411] EVENT {"data":{"offset":0},"event":"RTC_CHANGE","timestamp":{"microseconds":500148,"seconds":1673402415}}
[2023-01-11T03:09:07.645583+01:00] [debug] [pid:18411] EVENT {"data":{"guest":true,"reason":"guest-reset"},"event":"RESET","timestamp":{"microseconds":984310,"seconds":1673402821}}
[2023-01-11T03:09:07.645732+01:00] [debug] [pid:18411] EVENT {"data":{"guest":true,"reason":"guest-reset"},"event":"RESET","timestamp":{"microseconds":1679,"seconds":1673402822}}
[2023-01-11T03:09:07.645829+01:00] [debug] [pid:18411] EVENT {"data":{"offset":0},"event":"RTC_CHANGE","timestamp":{"microseconds":660517,"seconds":1673402859}}
[2023-01-11T03:09:07.645935+01:00] [debug] [pid:18411] Loading snapshot (Current VM state is running)
[2023-01-11T03:09:07.665855+01:00] [debug] [pid:18411] EVENT {"event":"STOP","timestamp":{"microseconds":665540,"seconds":1673402947}}
[2023-01-11T03:09:07.668188+01:00] [info] [pid:18411] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2023-01-11T03:09:08.762849+01:00] [debug] [pid:18411] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 18411 (/usr/bin/isotovideo: backen)
[2023-01-11T03:09:09.403896+01:00] [debug] [pid:18411] running `/usr/bin/qemu-img create -f qcow2 -F qcow2 -b /var/lib/openqa/pool/9/raid/hd0-overlay0 /var/lib/openqa/pool/9/raid/hd0-overlay1 32212254720`
[2023-01-11T03:09:09.489096+01:00] [debug] [pid:18411] Formatting '/var/lib/openqa/pool/9/raid/hd0-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=32212254720 backing_file=/var/lib/openqa/pool/9/raid/hd0-overlay0 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
[2023-01-11T03:09:09.489330+01:00] [debug] [pid:18411] running `/usr/bin/qemu-img create -f qcow2 -F qcow2 -b /var/lib/openqa/pool/9/raid/hd1 /var/lib/openqa/pool/9/raid/hd1-overlay1 20G`
[2023-01-11T03:09:09.556016+01:00] [debug] [pid:18411] Formatting '/var/lib/openqa/pool/9/raid/hd1-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=21474836480 backing_file=/var/lib/openqa/pool/9/raid/hd1 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
[2023-01-11T03:09:09.556181+01:00] [debug] [pid:18411] running `/usr/bin/qemu-img create -f qcow2 -F qcow2 -b /var/lib/openqa/pool/9/raid/cd0-overlay0 /var/lib/openqa/pool/9/raid/cd0-overlay1 4083154944`
[2023-01-11T03:09:09.620670+01:00] [debug] [pid:18411] Formatting '/var/lib/openqa/pool/9/raid/cd0-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=4083154944 backing_file=/var/lib/openqa/pool/9/raid/cd0-overlay0 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
[2023-01-11T03:09:09.620790+01:00] [debug] [pid:18411] init_blockdev_images: Finished creating block devices
[2023-01-11T03:09:09.621939+01:00] [debug] [pid:18411] starting: /usr/bin/qemu-system-x86_64 -device VGA,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 1536 -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot order=c -device qemu-xhci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :99,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay1-file,filename=/var/lib/openqa/pool/9/raid/hd0-overlay1,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay1,file=hd0-overlay1-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd0-device,drive=hd0-overlay1,bootindex=0,serial=hd0 -blockdev driver=file,node-name=hd1-overlay1-file,filename=/var/lib/openqa/pool/9/raid/hd1-overlay1,cache.no-flush=on -blockdev driver=qcow2,node-name=hd1-overlay1,file=hd1-overlay1-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd1-device,drive=hd1-overlay1,serial=hd1 -blockdev driver=file,node-name=cd0-overlay1-file,filename=/var/lib/openqa/pool/9/raid/cd0-overlay1,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay1,file=cd0-overlay1-file,cache.no-flush=on,discard=unmap -device scsi-cd,id=cd0-device,drive=cd0-overlay1,serial=cd0 -incoming defer
[2023-01-11T03:09:09.638081+01:00] [debug] [pid:18411] Waiting for 0 attempts
[2023-01-11T03:09:10.638747+01:00] [debug] [pid:18411] Waiting for 1 attempts
[2023-01-11T03:09:11.639230+01:00] [debug] [pid:18411] Finished after 2 attempts
[2023-01-11T03:09:11.641772+01:00] [debug] [pid:18411] EVENT {"data":{"status":"setup"},"event":"MIGRATION","timestamp":{"microseconds":641592,"seconds":1673402951}}
[2023-01-11T03:09:12.644735+01:00] [debug] [pid:18411] EVENT {"data":{"status":"active"},"event":"MIGRATION","timestamp":{"microseconds":647493,"seconds":1673402951}}
[2023-01-11T03:09:14.674697+01:00] [debug] [pid:18411] EVENT {"data":{"status":"completed"},"event":"MIGRATION","timestamp":{"microseconds":126028,"seconds":1673402954}}
[2023-01-11T03:09:14.675172+01:00] [debug] [pid:18411] <<< consoles::virtio_terminal::open_pipe(pipe_prefix="/var/lib/openqa/pool/9/virtio_console")
[2023-01-11T03:09:14.675943+01:00] [info] [pid:18411] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2023-01-11T03:09:14.676155+01:00] [info] [pid:18411] ::: consoles::virtio_terminal::open_pipe: Set PIPE_SZ from 65536 to 1048576
[2023-01-11T03:09:14.676522+01:00] [debug] [pid:18411] Establishing VNC connection to localhost:5999
[2023-01-11T03:09:14.684079+01:00] [debug] [pid:18411] pointer type 1 0 1024 768 -257
[2023-01-11T03:09:14.684207+01:00] [debug] [pid:18411] led state 0 1 1 -261
[2023-01-11T03:09:14.743431+01:00] [debug] [pid:18411] Restored snapshot
[2023-01-11T03:09:14.743780+01:00] [debug] [pid:18411] EVENT {"data":{"client":{"family":"ipv4","host":"127.0.0.1","service":"34386","websocket":false},"server":{"auth":"none","family":"ipv4","host":"0.0.0.0","service":"5999","websocket":false}},"event":"VNC_CONNECTED","timestamp":{"microseconds":677352,"seconds":1673402954}}
[2023-01-11T03:09:14.743903+01:00] [debug] [pid:18411] EVENT {"data":{"client":{"family":"ipv4","host":"127.0.0.1","service":"34386","websocket":false},"server":{"auth":"none","family":"ipv4","host":"0.0.0.0","service":"5999","websocket":false}},"event":"VNC_INITIALIZED","timestamp":{"microseconds":681922,"seconds":1673402954}}
[2023-01-11T03:09:14.744078+01:00] [debug] [pid:18411] EVENT {"event":"RESUME","timestamp":{"microseconds":743960,"seconds":1673402954}}
[2023-01-11T03:09:14.802705+01:00] [debug] [pid:18411] QEMU: QEMU emulator version 6.2.0 (SUSE Linux Enterprise 15)
[2023-01-11T03:09:14.802774+01:00] [debug] [pid:18411] QEMU: Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
[2023-01-11T03:09:15.221859+01:00] [debug] [pid:18349] ||| starting cifs tests/network/cifs.pm
Updated by ph03nix almost 2 years ago
Running a series of test runs by cloning https://openqa.opensuse.org/tests/3034138 20 times:
- https://duck-norris.qe.suse.de/tests/11822
- https://duck-norris.qe.suse.de/tests/11823
- https://duck-norris.qe.suse.de/tests/11824
- https://duck-norris.qe.suse.de/tests/11825
- https://duck-norris.qe.suse.de/tests/11826
- https://duck-norris.qe.suse.de/tests/11827
- https://duck-norris.qe.suse.de/tests/11828
- https://duck-norris.qe.suse.de/tests/11829
- https://duck-norris.qe.suse.de/tests/11830
- https://duck-norris.qe.suse.de/tests/11831
- https://duck-norris.qe.suse.de/tests/11832
- https://duck-norris.qe.suse.de/tests/11833
- https://duck-norris.qe.suse.de/tests/11834
- https://duck-norris.qe.suse.de/tests/11835
- https://duck-norris.qe.suse.de/tests/11836
- https://duck-norris.qe.suse.de/tests/11837
- https://duck-norris.qe.suse.de/tests/11838
- https://duck-norris.qe.suse.de/tests/11839
- https://duck-norris.qe.suse.de/tests/11840
- https://duck-norris.qe.suse.de/tests/11841
Updated by ph03nix almost 2 years ago
Yes! Could reproduce the issue one time: https://duck-norris.qe.suse.de/tests/11823#step/cifs/3
Updated by ph03nix almost 2 years ago
I see almost all test runs failing. This is a good starting point to test my hypothesis, that the issue arises from using a different console in the snapshot point and in the restore point.
Updated by ph03nix almost 2 years ago
The console was not the issue, the test run is still failing: https://duck-norris.qe.suse.de/tests/11878#
Updated by ph03nix almost 2 years ago
However it's not failing, if the root-console
was selected instead of the serial terminal: https://duck-norris.qe.suse.de/tests/11875#
Updated by ph03nix almost 2 years ago
Nope, also switching to root-console
before the snapshot rollback is not really helping: https://duck-norris.qe.suse.de/tests/11879#step/cifs/2
Updated by ph03nix almost 2 years ago
Ok, I'm out of luck with the easy stuff. Now going to advance to do some more dirty work here. First attempt: Increase all relevant timeouts and check if this is just a timeout issue.
This might get ugly though.
Updated by ph03nix almost 2 years ago
Observation: After restoring to the last snapshot, the screen in the openQA web UI remains completely black. The live log states
[2023-01-12T16:35:36.730226+01:00] [debug] [pid:21011] <<< serial_terminal::login()
[2023-01-12T16:35:36.730336+01:00] [debug] [pid:21011] tests/network/cifs.pm:48 called serial_terminal::select_serial_terminal -> lib/serial_terminal.pm:357 called testapi::select_console -> lib/susedistribution.pm:838 called serial_terminal::login -> lib/serial_terminal.pm:133 called testapi::wait_serial
[2023-01-12T16:35:36.730497+01:00] [debug] [pid:21011] <<< testapi::wait_serial(expect_not_found=0, buffer_size=undef, regexp=qr/login:\s*$/ui, quiet=1, no_regex=0, timeout=3, record_output=undef)
[2023-01-12T16:35:36.731168+01:00] [debug] [pid:21018] <<< consoles::serial_screen::read_until(expect_not_found=0, regexp="(?^ui:login:\\s*\$)", buffer_size=undef, quiet=1, record_output=undef, pattern="(?^ui:login:\\s*\$)", cmd="backend_wait_serial", no_regex=0, json_cmd_token="wrgzCHmF", timeout=3)
[2023-01-12T16:35:39.734807+01:00] [debug] [pid:21011] >>> testapi::wait_serial: (?^ui:login:\s*$): fail
[2023-01-12T16:35:39.735014+01:00] [debug] [pid:21011] tests/network/cifs.pm:48 called serial_terminal::select_serial_terminal -> lib/serial_terminal.pm:357 called testapi::select_console -> lib/susedistribution.pm:838 called serial_terminal::login -> lib/serial_terminal.pm:136 called testapi::send_key
[2023-01-12T16:35:39.735157+01:00] [debug] [pid:21011] <<< testapi::send_key(key="ret", wait_screen_change=0, do_wait=0)
[2023-01-12T16:35:39.736120+01:00] [debug] [pid:21018] <<< consoles::serial_screen::type_string(key="ret", json_cmd_token="VshvoDID", text="\n", cmd="backend_send_key")
[2023-01-12T16:35:39.736704+01:00] [debug] [pid:21011] tests/network/cifs.pm:48 called serial_terminal::select_serial_terminal -> lib/serial_terminal.pm:357 called testapi::select_console -> lib/susedistribution.pm:838 called serial_terminal::login -> lib/serial_terminal.pm:137 called testapi::wait_serial
[2023-01-12T16:35:39.736908+01:00] [debug] [pid:21011] <<< testapi::wait_serial(timeout=300, record_output=undef, no_regex=0, quiet=undef, expect_not_found=0, buffer_size=undef, regexp=qr/login:\s*$/ui)
[2023-01-12T16:35:39.737822+01:00] [debug] [pid:21018] <<< consoles::serial_screen::read_until(expect_not_found=0, regexp="(?^ui:login:\\s*\$)", buffer_size=undef, record_output=undef, quiet=undef, cmd="backend_wait_serial", pattern="(?^ui:login:\\s*\$)", timeout=300, json_cmd_token="qLuwvvpn", no_regex=0)
Looks like it's somehow jammed.
Updated by ph03nix almost 2 years ago
So it's likely also not a timeout issue: https://duck-norris.qe.suse.de/tests/11881#step/cifs/3 (failure after increasing the timeout from 90 to 300 seconds).
Updated by ph03nix almost 2 years ago
- Related to action #109929: Snapshot rollback after SUT reboot breaks console switching added
Updated by ph03nix almost 2 years ago
The issue is caused by https://progress.opensuse.org/issues/109929
Updated by ph03nix almost 2 years ago
This means we can move the zfs test to the back of the test run, where the fallout should not hurt anyone else.
Updated by ph03nix almost 2 years ago
Updated by ph03nix almost 2 years ago
- % Done changed from 100 to 70
Updated by ph03nix almost 2 years ago
I keep the ticket open for a bit to monitor the situation and see, if the last changes in the schedule helped.
Updated by ph03nix almost 2 years ago
- Status changed from In Progress to Resolved
- % Done changed from 70 to 100