Project

General

Profile

Actions

action #106957

closed

[tools][tw][aarch64][QEMUTPM][tpm2.0] on some O3 arm workers, swtpm 2.0 device can't be created [or created failed]

Added by rfan1 about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-02-17
Due date:
% Done:

0%

Estimated time:

Description

Hello tool team experts, can I ask for your kindly help to check the issue below?
BTW, the failure is not seen on all arm workers
The below one can pass:
https://openqa.opensuse.org/tests/2191037

Observation

With job setting below:
QEMUTPM: 1
QEMUTPM_VER: 2.0

The tests failed on some arm workers, with different error messages:

https://openqa.opensuse.org/tests/2191101
Reason: backend died: runcmd 'swtpm socket --tpmstate dir=/tmp/mytpm1 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock --log level=20 -d --tpm2' failed with exit code 1 at /usr/lib/os-autoinst/osutils.pm line 109.

https://openqa.opensuse.org/tests/2193108 [seems the backend does not create the swtpm socket at all]
[2022-02-17T02:09:04.271 UTC] [warn] !!! : qemu-system-aarch64: -chardev socket,id=chrtpm,path=/tmp/mytpm1/swtpm-sock: Failed to connect socket /tmp/mytpm1/swtpm-sock: No such file or directory

Steps to reproduce

Job settings:
QEMUTPM: 1
QEMUTPM_VER: 2.0

Workaround

Re-trigger the job and some of the workers can pass the tests

@cdywan, just assigned the ticket to you since you helped much on this before.


Related issues 1 (0 open1 closed)

Related to openQA Project - action #107155: [tools][tw][sle][QEMUTPM][tpm2.0] swtpm devices need to be clear once test doneResolvedXiaojing_liu2022-02-21

Actions
Actions #1

Updated by okurz about 2 years ago

  • Assignee deleted (livdywan)
  • Target version set to Ready
Actions #2

Updated by okurz about 2 years ago

  • Project changed from openQA Infrastructure to openQA Project
  • Category set to Regressions/Crashes
  • Priority changed from Normal to Low

On the machine "aarch64" if I execute manually

sudo -u _openqa-worker swtpm socket --tpmstate dir=/tmp/mytpm1 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock --log level=20 -d --tpm2; echo $?

I did not encounter a failure for 10 calls. I wonder though if the daemon processes are ever properly cleaned up. And I am not sure if errors are properly propagated into the logs.

@rfan1 we will look into this but don't expect a quick resolution from us.

Actions #3

Updated by rfan1 about 2 years ago

okurz wrote:

On the machine "aarch64" if I execute manually

sudo -u _openqa-worker swtpm socket --tpmstate dir=/tmp/mytpm1 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock --log level=20 -d --tpm2; echo $?

I did not encounter a failure for 10 calls. I wonder though if the daemon processes are ever properly cleaned up. And I am not sure if errors are properly propagated into the logs.

Thanks Oliver
@rfan1 we will look into this but don't expect a quick resolution from us.

Actions #4

Updated by okurz about 2 years ago

@rfan1 seems you mangled quotes and your anwser but I got it.

Seems I can reproduce some problems locally:

$ env runs=200 count_fail_ratio prove -I. t/18-qemu-options.t 
## Run 1
t/18-qemu-options.t .. ok   
All tests successful.
Files=1, Tests=4, 36 wallclock secs ( 0.16 usr  0.01 sys + 25.36 cusr  9.81 csys = 35.34 CPU)
Result: PASS
## count_fail_ratio: Run: 1. Fails: 0. Fail ratio 0%
## Run 2
t/18-qemu-options.t .. ok   
All tests successful.
Files=1, Tests=4, 36 wallclock secs ( 0.15 usr  0.00 sys + 25.10 cusr 10.65 csys = 35.90 CPU)
Result: PASS
## count_fail_ratio: Run: 2. Fails: 0. Fail ratio 0%
## Run 3
t/18-qemu-options.t .. ok   
All tests successful.
Files=1, Tests=4, 35 wallclock secs ( 0.14 usr  0.01 sys + 25.11 cusr 10.75 csys = 36.01 CPU)
Result: PASS
## count_fail_ratio: Run: 3. Fails: 0. Fail ratio 0%
## Run 4
t/18-qemu-options.t .. ok   
All tests successful.
Files=1, Tests=4, 37 wallclock secs ( 0.14 usr  0.02 sys + 24.83 cusr 10.80 csys = 35.79 CPU)
Result: PASS
## count_fail_ratio: Run: 4. Fails: 0. Fail ratio 0%
## Run 5
t/18-qemu-options.t .. ok   
All tests successful.
Files=1, Tests=4, 36 wallclock secs ( 0.15 usr  0.01 sys + 25.67 cusr 10.71 csys = 36.54 CPU)
Result: PASS
## count_fail_ratio: Run: 5. Fails: 0. Fail ratio 0%
## Run 6
t/18-qemu-options.t .. ok   
All tests successful.
Files=1, Tests=4, 35 wallclock secs ( 0.15 usr  0.01 sys + 25.73 cusr 10.77 csys = 36.66 CPU)
Result: PASS
## count_fail_ratio: Run: 6. Fails: 0. Fail ratio 0%
## Run 7
t/18-qemu-options.t .. 2/? 
    #   Failed test 'swtpm default device created'
    #   at t/18-qemu-options.t line 142.
    #                   '[2022-02-17T10:08:05.100327+01:00] [debug] Current version is a2c1cf8777de240c303955cd3ab439cd03ef988c [interface v25]
    # [2022-02-17T10:08:05.106589+01:00] [debug] Setting forced test parameter QEMU_DISABLE_SNAPSHOTS -> 1
    # [2022-02-17T10:08:05.116785+01:00] [debug] git hash in /home/okurz/local/os-autoinst/os-autoinst/t/data/tests: a2c1cf8777de240c303955cd3ab439cd03ef988c
    # [2022-02-17T10:08:05.117225+01:00] [info] ::: OpenQA::Isotovideo::Utils::load_test_schedule: Enforced test schedule by 'SCHEDULE' variable in action
    # [2022-02-17T10:08:05.133298+01:00] [debug] scheduling noop tests/noop.pm
    # [2022-02-17T10:08:05.279646+01:00] [info] cmdsrv: daemon reachable under http://*:15223/FSEejrXTtL/
    # [2022-02-17T10:08:05.282171+01:00] [info] Listening at "http://[::]:15223"
    # Web application available at http://[::]:15223
    # [2022-02-17T10:08:05.285728+01:00] [debug] git hash in /home/okurz/local/os-autoinst/os-autoinst/t/data/tests/needles: a2c1cf8777de240c303955cd3ab439cd03ef988c
    # [2022-02-17T10:08:05.285981+01:00] [debug] init needles from /home/okurz/local/os-autoinst/os-autoinst/t/data/tests/needles
    # [2022-02-17T10:08:05.287678+01:00] [debug] loaded 4 needles
    # [2022-02-17T10:08:05.291984+01:00] [debug] Blocking SIGCHLD and SIGTERM
    # [2022-02-17T10:08:05.337774+01:00] [debug] Unblocking SIGCHLD and SIGTERM
    # [2022-02-17T10:08:05.556071+01:00] [debug] Blocking SIGCHLD and SIGTERM
    # [2022-02-17T10:08:05.556156+01:00] [debug] 11927: channel_out 17, channel_in 16
    # [2022-02-17T10:08:05.631526+01:00] [debug] Unblocking SIGCHLD and SIGTERM
    # [2022-02-17T10:08:05.636543+01:00] [debug] 11949: cmdpipe 15, rsppipe 18
    # [2022-02-17T10:08:05.636865+01:00] [debug] started mgmt loop with pid 11949
    # [2022-02-17T10:08:05.739838+01:00] [debug] qemu version detected: 5.2.0
    # [2022-02-17T10:08:05.741432+01:00] [debug] running `/usr/bin/chattr +C /tmp/18-qemu-options.t-wIeC/pool/raid`
    # [2022-02-17T10:08:05.767899+01:00] [debug] Command `/usr/bin/chattr +C /tmp/18-qemu-options.t-wIeC/pool/raid` terminated with 0
    # [2022-02-17T10:08:05.768070+01:00] [debug] Configuring storage controllers and block devices
    # [2022-02-17T10:08:05.768856+01:00] [debug] running `/usr/bin/qemu-img info --output=json /home/okurz/local/os-autoinst/os-autoinst/t/data/Core-7.2.iso`
    # [2022-02-17T10:08:05.869573+01:00] [debug] Initializing block device images
    # [2022-02-17T10:08:05.869990+01:00] [debug] running `/usr/bin/qemu-img create -f qcow2 /tmp/18-qemu-options.t-wIeC/pool/raid/hd0 10G`
    # [2022-02-17T10:08:05.977597+01:00] [debug] Formatting '/tmp/18-qemu-options.t-wIeC/pool/raid/hd0', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
    # [2022-02-17T10:08:05.977864+01:00] [debug] running `/usr/bin/qemu-img create -f qcow2 -F raw -b /home/okurz/local/os-autoinst/os-autoinst/t/data/Core-7.2.iso /tmp/18-qemu-options.t-wIeC/pool/raid/cd0-overlay0 11116544`
    # [2022-02-17T10:08:06.066835+01:00] [debug] Formatting '/tmp/18-qemu-options.t-wIeC/pool/raid/cd0-overlay0', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=11116544 backing_file=/home/okurz/local/os-autoinst/os-autoinst/t/data/Core-7.2.iso backing_fmt=raw lazy_refcounts=off refcount_bits=16
    # [2022-02-17T10:08:06.067018+01:00] [debug] init_blockdev_images: Finished creating block devices
    # [2022-02-17T10:08:06.068822+01:00] [debug] starting: /usr/bin/qemu-system-i386 -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -m 1024 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -chardev socket,id=chrtpm,path=/tmp/mytpm4/swtpm-sock -tpmdev emulator,id=tpm0,chardev=chrtpm -device tpm-tis,tpmdev=tpm0 -boot once=d -smp 1 -enable-kvm -no-shutdown -vnc :90,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -blockdev driver=file,node-name=hd0-file,filename=/tmp/18-qemu-options.t-wIeC/pool/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on -device ide-hd,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/tmp/18-qemu-options.t-wIeC/pool/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device ide-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0
    # [2022-02-17T10:08:06.074094+01:00] [info] ::: backend::qemu::start_qemu: Not connecting to QEMU as requested by QEMU_ONLY_EXEC
    # GOT GO
    # 
    # [2022-02-17T10:08:06.077427+01:00] [debug] Snapshots are not supported
    # [2022-02-17T10:08:06.078625+01:00] [debug] ||| starting noop tests/noop.pm
    # [2022-02-17T10:08:06.085634+01:00] [debug] ||| finished noop tests (runtime: 0 s)
    # [2022-02-17T10:08:06.088854+01:00] [debug] stopping autotest process 11935
    # [2022-02-17T10:08:06.097066+01:00] [debug] [autotest] process exited: 0
    # [2022-02-17T10:08:06.197595+01:00] [debug] done with autotest process
    # [2022-02-17T10:08:06.198339+01:00] [debug] stopping command server 11932 because test execution ended
    # [2022-02-17T10:08:06.198513+01:00] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:15223/FSEejrXTtL/broadcast
    # [2022-02-17T10:08:06.217279+01:00] [debug] commands process exited: 0
    # [2022-02-17T10:08:06.317732+01:00] [debug] done with command server
    # [2022-02-17T10:08:06.317869+01:00] [debug] isotovideo done
    # [2022-02-17T10:08:06.318446+01:00] [info] ::: backend::qemu::handle_qmp_command: Skipping the following qmp_command because QEMU_ONLY_EXEC is enabled:
    #   {"execute":"query-status"}
    #   
    # [2022-02-17T10:08:06.318639+01:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
    #   Can't use an undefined value as a HASH reference at /home/okurz/local/os-autoinst/os-autoinst/t/../backend/qemu.pm line 1049.
    # [2022-02-17T10:08:06.318922+01:00] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
    # [2022-02-17T10:08:07.340937+01:00] [debug] Passing remaining frames to the video encoder
    # [2022-02-17T10:08:07.444166+01:00] [debug] Waiting for video encoder to finalize the video
    # [2022-02-17T10:08:07.444319+01:00] [debug] The built-in video encoder (pid 11963) terminated
    # [2022-02-17T10:08:07.446076+01:00] [debug] QEMU: QEMU emulator version 5.2.0 (SUSE Linux Enterprise 15)
    # [2022-02-17T10:08:07.446209+01:00] [debug] QEMU: Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers
    # [2022-02-17T10:08:07.446314+01:00] [debug] QEMU: qemu-system-i386: terminating on signal 15 from pid 11949 (/home/okurz/local/os-autoinst/os-autoinst/t/../isotovideo: backend)
    # [2022-02-17T10:08:07.448282+01:00] [debug] sending magic and exit
    # [2022-02-17T10:08:07.448666+01:00] [debug] received magic close
    # [2022-02-17T10:08:07.455597+01:00] [debug] backend process exited: 0
    # [2022-02-17T10:08:07.556146+01:00] [debug] backend shutdown state: ?
    # [2022-02-17T10:08:07.557008+01:00] [debug] stopping backend process 11949
    # [2022-02-17T10:08:07.557114+01:00] [debug] done with backend process
    # 11927: EXIT 0
    # '
    #     doesn't match '(?^u:swtpm socket --tpmstate dir=/tmp/mytpm4 --ctrl type=unixio,path=/tmp/mytpm4/swtpm-sock --log level=20 -d --tpm2)'
t/18-qemu-options.t .. 3/?     # Looks like you failed 1 test of 14.

#   Failed test 'qemu_tpm_option'
#   at t/18-qemu-options.t line 151.
# Looks like you failed 1 test of 4.
t/18-qemu-options.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests 

Test Summary Report
-------------------
t/18-qemu-options.t (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
Files=1, Tests=4, 36 wallclock secs ( 0.16 usr  0.01 sys + 25.32 cusr 10.68 csys = 36.17 CPU)
Result: FAIL
## count_fail_ratio: Run: 7. Fails: 1. Fail ratio 14%
## Run 8
t/18-qemu-options.t .. ok   
All tests successful.

EDIT: 2022-02-17 Full statistic results: count_fail_ratio: Run: 200. Fails: 16. Fail ratio 8%

Actions #5

Updated by okurz about 2 years ago

  • Related to action #107155: [tools][tw][sle][QEMUTPM][tpm2.0] swtpm devices need to be clear once test done added
Actions #6

Updated by ggardet_arm about 2 years ago

openqa-aarch64 worker seems fine now: https://openqa.opensuse.org/tests/2223880

But other aarch64 workers may need some initial setup? Do we have some docs for this?

Actions #7

Updated by rfan1 about 2 years ago

https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc

I may need change my job parameter:

QEMUTPM = 'instance'

Actions #8

Updated by ggardet_arm about 2 years ago

swtpm is not available on one of the worker, because it is running SLE15-SP2. I will migrate it to Leap 15.3.

Actions #9

Updated by okurz about 2 years ago

  • Status changed from New to In Progress
  • Assignee set to ggardet_arm
  • Target version changed from Ready to future

@ggardet_arm thank you for picking this up so I am assigning the ticket to you.

Actions #10

Updated by ggardet_arm about 2 years ago

  • Status changed from In Progress to Resolved

sudo -u _openqa-worker swtpm socket --tpmstate dir=/tmp/mytpm1 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock --log level=20 -d --tpm2; echo $? runs properly now on all aarch64 workers from o3.

Actions #11

Updated by rfan1 about 2 years ago

Thanks much for your kindly help!

Actions

Also available in: Atom PDF