action #106957
closed[tools][tw][aarch64][QEMUTPM][tpm2.0] on some O3 arm workers, swtpm 2.0 device can't be created [or created failed]
0%
Description
Hello tool team experts, can I ask for your kindly help to check the issue below?
BTW, the failure is not seen on all arm workers
The below one can pass:
https://openqa.opensuse.org/tests/2191037
Observation¶
With job setting below:
QEMUTPM: 1
QEMUTPM_VER: 2.0
The tests failed on some arm workers, with different error messages:
https://openqa.opensuse.org/tests/2191101
Reason: backend died: runcmd 'swtpm socket --tpmstate dir=/tmp/mytpm1 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock --log level=20 -d --tpm2' failed with exit code 1 at /usr/lib/os-autoinst/osutils.pm line 109.
https://openqa.opensuse.org/tests/2193108 [seems the backend does not create the swtpm socket at all]
[2022-02-17T02:09:04.271 UTC] [warn] !!! : qemu-system-aarch64: -chardev socket,id=chrtpm,path=/tmp/mytpm1/swtpm-sock: Failed to connect socket /tmp/mytpm1/swtpm-sock: No such file or directory
Steps to reproduce¶
Job settings:
QEMUTPM: 1
QEMUTPM_VER: 2.0
Workaround¶
Re-trigger the job and some of the workers can pass the tests
@cdywan, just assigned the ticket to you since you helped much on this before.
Updated by okurz over 2 years ago
- Assignee deleted (
livdywan) - Target version set to Ready
Updated by okurz over 2 years ago
- Project changed from openQA Infrastructure to openQA Project
- Category set to Regressions/Crashes
- Priority changed from Normal to Low
On the machine "aarch64" if I execute manually
sudo -u _openqa-worker swtpm socket --tpmstate dir=/tmp/mytpm1 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock --log level=20 -d --tpm2; echo $?
I did not encounter a failure for 10 calls. I wonder though if the daemon processes are ever properly cleaned up. And I am not sure if errors are properly propagated into the logs.
@rfan1 we will look into this but don't expect a quick resolution from us.
Updated by rfan1 over 2 years ago
okurz wrote:
On the machine "aarch64" if I execute manually
sudo -u _openqa-worker swtpm socket --tpmstate dir=/tmp/mytpm1 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock --log level=20 -d --tpm2; echo $?
I did not encounter a failure for 10 calls. I wonder though if the daemon processes are ever properly cleaned up. And I am not sure if errors are properly propagated into the logs.
Thanks Oliver
@rfan1 we will look into this but don't expect a quick resolution from us.
Updated by okurz over 2 years ago
@rfan1 seems you mangled quotes and your anwser but I got it.
Seems I can reproduce some problems locally:
$ env runs=200 count_fail_ratio prove -I. t/18-qemu-options.t
## Run 1
t/18-qemu-options.t .. ok
All tests successful.
Files=1, Tests=4, 36 wallclock secs ( 0.16 usr 0.01 sys + 25.36 cusr 9.81 csys = 35.34 CPU)
Result: PASS
## count_fail_ratio: Run: 1. Fails: 0. Fail ratio 0%
## Run 2
t/18-qemu-options.t .. ok
All tests successful.
Files=1, Tests=4, 36 wallclock secs ( 0.15 usr 0.00 sys + 25.10 cusr 10.65 csys = 35.90 CPU)
Result: PASS
## count_fail_ratio: Run: 2. Fails: 0. Fail ratio 0%
## Run 3
t/18-qemu-options.t .. ok
All tests successful.
Files=1, Tests=4, 35 wallclock secs ( 0.14 usr 0.01 sys + 25.11 cusr 10.75 csys = 36.01 CPU)
Result: PASS
## count_fail_ratio: Run: 3. Fails: 0. Fail ratio 0%
## Run 4
t/18-qemu-options.t .. ok
All tests successful.
Files=1, Tests=4, 37 wallclock secs ( 0.14 usr 0.02 sys + 24.83 cusr 10.80 csys = 35.79 CPU)
Result: PASS
## count_fail_ratio: Run: 4. Fails: 0. Fail ratio 0%
## Run 5
t/18-qemu-options.t .. ok
All tests successful.
Files=1, Tests=4, 36 wallclock secs ( 0.15 usr 0.01 sys + 25.67 cusr 10.71 csys = 36.54 CPU)
Result: PASS
## count_fail_ratio: Run: 5. Fails: 0. Fail ratio 0%
## Run 6
t/18-qemu-options.t .. ok
All tests successful.
Files=1, Tests=4, 35 wallclock secs ( 0.15 usr 0.01 sys + 25.73 cusr 10.77 csys = 36.66 CPU)
Result: PASS
## count_fail_ratio: Run: 6. Fails: 0. Fail ratio 0%
## Run 7
t/18-qemu-options.t .. 2/?
# Failed test 'swtpm default device created'
# at t/18-qemu-options.t line 142.
# '[2022-02-17T10:08:05.100327+01:00] [debug] Current version is a2c1cf8777de240c303955cd3ab439cd03ef988c [interface v25]
# [2022-02-17T10:08:05.106589+01:00] [debug] Setting forced test parameter QEMU_DISABLE_SNAPSHOTS -> 1
# [2022-02-17T10:08:05.116785+01:00] [debug] git hash in /home/okurz/local/os-autoinst/os-autoinst/t/data/tests: a2c1cf8777de240c303955cd3ab439cd03ef988c
# [2022-02-17T10:08:05.117225+01:00] [info] ::: OpenQA::Isotovideo::Utils::load_test_schedule: Enforced test schedule by 'SCHEDULE' variable in action
# [2022-02-17T10:08:05.133298+01:00] [debug] scheduling noop tests/noop.pm
# [2022-02-17T10:08:05.279646+01:00] [info] cmdsrv: daemon reachable under http://*:15223/FSEejrXTtL/
# [2022-02-17T10:08:05.282171+01:00] [info] Listening at "http://[::]:15223"
# Web application available at http://[::]:15223
# [2022-02-17T10:08:05.285728+01:00] [debug] git hash in /home/okurz/local/os-autoinst/os-autoinst/t/data/tests/needles: a2c1cf8777de240c303955cd3ab439cd03ef988c
# [2022-02-17T10:08:05.285981+01:00] [debug] init needles from /home/okurz/local/os-autoinst/os-autoinst/t/data/tests/needles
# [2022-02-17T10:08:05.287678+01:00] [debug] loaded 4 needles
# [2022-02-17T10:08:05.291984+01:00] [debug] Blocking SIGCHLD and SIGTERM
# [2022-02-17T10:08:05.337774+01:00] [debug] Unblocking SIGCHLD and SIGTERM
# [2022-02-17T10:08:05.556071+01:00] [debug] Blocking SIGCHLD and SIGTERM
# [2022-02-17T10:08:05.556156+01:00] [debug] 11927: channel_out 17, channel_in 16
# [2022-02-17T10:08:05.631526+01:00] [debug] Unblocking SIGCHLD and SIGTERM
# [2022-02-17T10:08:05.636543+01:00] [debug] 11949: cmdpipe 15, rsppipe 18
# [2022-02-17T10:08:05.636865+01:00] [debug] started mgmt loop with pid 11949
# [2022-02-17T10:08:05.739838+01:00] [debug] qemu version detected: 5.2.0
# [2022-02-17T10:08:05.741432+01:00] [debug] running `/usr/bin/chattr +C /tmp/18-qemu-options.t-wIeC/pool/raid`
# [2022-02-17T10:08:05.767899+01:00] [debug] Command `/usr/bin/chattr +C /tmp/18-qemu-options.t-wIeC/pool/raid` terminated with 0
# [2022-02-17T10:08:05.768070+01:00] [debug] Configuring storage controllers and block devices
# [2022-02-17T10:08:05.768856+01:00] [debug] running `/usr/bin/qemu-img info --output=json /home/okurz/local/os-autoinst/os-autoinst/t/data/Core-7.2.iso`
# [2022-02-17T10:08:05.869573+01:00] [debug] Initializing block device images
# [2022-02-17T10:08:05.869990+01:00] [debug] running `/usr/bin/qemu-img create -f qcow2 /tmp/18-qemu-options.t-wIeC/pool/raid/hd0 10G`
# [2022-02-17T10:08:05.977597+01:00] [debug] Formatting '/tmp/18-qemu-options.t-wIeC/pool/raid/hd0', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
# [2022-02-17T10:08:05.977864+01:00] [debug] running `/usr/bin/qemu-img create -f qcow2 -F raw -b /home/okurz/local/os-autoinst/os-autoinst/t/data/Core-7.2.iso /tmp/18-qemu-options.t-wIeC/pool/raid/cd0-overlay0 11116544`
# [2022-02-17T10:08:06.066835+01:00] [debug] Formatting '/tmp/18-qemu-options.t-wIeC/pool/raid/cd0-overlay0', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=11116544 backing_file=/home/okurz/local/os-autoinst/os-autoinst/t/data/Core-7.2.iso backing_fmt=raw lazy_refcounts=off refcount_bits=16
# [2022-02-17T10:08:06.067018+01:00] [debug] init_blockdev_images: Finished creating block devices
# [2022-02-17T10:08:06.068822+01:00] [debug] starting: /usr/bin/qemu-system-i386 -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -m 1024 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -chardev socket,id=chrtpm,path=/tmp/mytpm4/swtpm-sock -tpmdev emulator,id=tpm0,chardev=chrtpm -device tpm-tis,tpmdev=tpm0 -boot once=d -smp 1 -enable-kvm -no-shutdown -vnc :90,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -blockdev driver=file,node-name=hd0-file,filename=/tmp/18-qemu-options.t-wIeC/pool/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on -device ide-hd,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/tmp/18-qemu-options.t-wIeC/pool/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device ide-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0
# [2022-02-17T10:08:06.074094+01:00] [info] ::: backend::qemu::start_qemu: Not connecting to QEMU as requested by QEMU_ONLY_EXEC
# GOT GO
#
# [2022-02-17T10:08:06.077427+01:00] [debug] Snapshots are not supported
# [2022-02-17T10:08:06.078625+01:00] [debug] ||| starting noop tests/noop.pm
# [2022-02-17T10:08:06.085634+01:00] [debug] ||| finished noop tests (runtime: 0 s)
# [2022-02-17T10:08:06.088854+01:00] [debug] stopping autotest process 11935
# [2022-02-17T10:08:06.097066+01:00] [debug] [autotest] process exited: 0
# [2022-02-17T10:08:06.197595+01:00] [debug] done with autotest process
# [2022-02-17T10:08:06.198339+01:00] [debug] stopping command server 11932 because test execution ended
# [2022-02-17T10:08:06.198513+01:00] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:15223/FSEejrXTtL/broadcast
# [2022-02-17T10:08:06.217279+01:00] [debug] commands process exited: 0
# [2022-02-17T10:08:06.317732+01:00] [debug] done with command server
# [2022-02-17T10:08:06.317869+01:00] [debug] isotovideo done
# [2022-02-17T10:08:06.318446+01:00] [info] ::: backend::qemu::handle_qmp_command: Skipping the following qmp_command because QEMU_ONLY_EXEC is enabled:
# {"execute":"query-status"}
#
# [2022-02-17T10:08:06.318639+01:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
# Can't use an undefined value as a HASH reference at /home/okurz/local/os-autoinst/os-autoinst/t/../backend/qemu.pm line 1049.
# [2022-02-17T10:08:06.318922+01:00] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
# [2022-02-17T10:08:07.340937+01:00] [debug] Passing remaining frames to the video encoder
# [2022-02-17T10:08:07.444166+01:00] [debug] Waiting for video encoder to finalize the video
# [2022-02-17T10:08:07.444319+01:00] [debug] The built-in video encoder (pid 11963) terminated
# [2022-02-17T10:08:07.446076+01:00] [debug] QEMU: QEMU emulator version 5.2.0 (SUSE Linux Enterprise 15)
# [2022-02-17T10:08:07.446209+01:00] [debug] QEMU: Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers
# [2022-02-17T10:08:07.446314+01:00] [debug] QEMU: qemu-system-i386: terminating on signal 15 from pid 11949 (/home/okurz/local/os-autoinst/os-autoinst/t/../isotovideo: backend)
# [2022-02-17T10:08:07.448282+01:00] [debug] sending magic and exit
# [2022-02-17T10:08:07.448666+01:00] [debug] received magic close
# [2022-02-17T10:08:07.455597+01:00] [debug] backend process exited: 0
# [2022-02-17T10:08:07.556146+01:00] [debug] backend shutdown state: ?
# [2022-02-17T10:08:07.557008+01:00] [debug] stopping backend process 11949
# [2022-02-17T10:08:07.557114+01:00] [debug] done with backend process
# 11927: EXIT 0
# '
# doesn't match '(?^u:swtpm socket --tpmstate dir=/tmp/mytpm4 --ctrl type=unixio,path=/tmp/mytpm4/swtpm-sock --log level=20 -d --tpm2)'
t/18-qemu-options.t .. 3/? # Looks like you failed 1 test of 14.
# Failed test 'qemu_tpm_option'
# at t/18-qemu-options.t line 151.
# Looks like you failed 1 test of 4.
t/18-qemu-options.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests
Test Summary Report
-------------------
t/18-qemu-options.t (Wstat: 256 Tests: 4 Failed: 1)
Failed test: 3
Non-zero exit status: 1
Files=1, Tests=4, 36 wallclock secs ( 0.16 usr 0.01 sys + 25.32 cusr 10.68 csys = 36.17 CPU)
Result: FAIL
## count_fail_ratio: Run: 7. Fails: 1. Fail ratio 14%
## Run 8
t/18-qemu-options.t .. ok
All tests successful.
EDIT: 2022-02-17 Full statistic results: count_fail_ratio: Run: 200. Fails: 16. Fail ratio 8%
Updated by okurz over 2 years ago
- Related to action #107155: [tools][tw][sle][QEMUTPM][tpm2.0] swtpm devices need to be clear once test done added
Updated by ggardet_arm over 2 years ago
openqa-aarch64
worker seems fine now: https://openqa.opensuse.org/tests/2223880
But other aarch64 workers may need some initial setup? Do we have some docs for this?
Updated by rfan1 over 2 years ago
https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc
I may need change my job parameter:
QEMUTPM = 'instance'
Updated by ggardet_arm over 2 years ago
swtpm
is not available on one of the worker, because it is running SLE15-SP2. I will migrate it to Leap 15.3.
Updated by okurz over 2 years ago
- Status changed from New to In Progress
- Assignee set to ggardet_arm
- Target version changed from Ready to future
@ggardet_arm thank you for picking this up so I am assigning the ticket to you.
Updated by ggardet_arm over 2 years ago
- Status changed from In Progress to Resolved
sudo -u _openqa-worker swtpm socket --tpmstate dir=/tmp/mytpm1 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock --log level=20 -d --tpm2; echo $?
runs properly now on all aarch64 workers from o3.