Project

General

Profile

Actions

action #176262

closed

SLE Micro images tests incomplete on qemu start with auto_review:"swtpm-sock': Connection refused":retry size:S

Added by jlausuch 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Low
Category:
Regressions/Crashes
Target version:
Start date:
2025-01-28
Due date:
% Done:

0%

Estimated time:

Description

Observation

There are lately many failures booting SLE Micro images.

https://openqa.suse.de/tests/16474452
https://openqa.suse.de/tests/16594262
https://openqa.suse.de/tests/16569139
https://openqa.suse.de/tests/16591684
https://openqa.suse.de/tests/16594267
https://openqa.suse.de/tests/16569111
https://openqa.suse.de/tests/16594790
https://openqa.suse.de/tests/16568974
https://openqa.suse.de/tests/16594800

The issue is always the same:

[2025-01-28T14:08:07.635365Z] [debug] [pid:31496] starting: /usr/bin/qemu-system-x86_64 -device virtio-vga,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 2048 -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -chardev socket,id=chrtpm,path=/tmp/mytpm10/swtpm-sock -tpmdev emulator,id=tpm0,chardev=chrtpm -device tpm-tis,tpmdev=tpm0 -device qemu-xhci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :100,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console_user,path=virtio_console_user,logfile=virtio_console_user.log,logappend=on -device virtconsole,chardev=virtio_console_user,name=org.openqa.console.virtio_console_user -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/10/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -blockdev driver=file,node-name=hd1-overlay0-file,filename=/var/lib/openqa/pool/10/raid/hd1-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd1-overlay0,file=hd1-overlay0-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd1-device,drive=hd1-overlay0,serial=hd1 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/10/raid/cd0-overlay0,cache.no-flush=on -blockdev driv
er=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on,discard=unmap -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/10/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/10/raid/pflash-vars-overlay0,unit=1
[2025-01-28T14:08:07.641795Z] [debug] [pid:31496] Waiting for 0 attempts
dmesg: read kernel buffer failed: Operation not permitted
[2025-01-28T14:08:07.794202Z] [debug] [pid:31496] Waiting for 1 attempts
[2025-01-28T14:08:07.794363Z] [info] [pid:31496] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  QEMU terminated before QMP connection could be established. Check for errors below
[2025-01-28T14:08:07.794616Z] [info] [pid:31496] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2025-01-28T14:08:07.795840Z] [debug] [pid:31496] Passing remaining frames to the video encoder
[image2pipe @ 0x561f981105c0] Could not find codec parameters for stream 0 (Video: ppm, none): unspecified size
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options
Input #0, image2pipe, from 'pipe:':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: ppm, none, 24 fps, 24 tbr, 24 tbn, 24 tbc
Output #0, webm, to 'video.webm':
Output file #0 does not contain any stream
[2025-01-28T14:08:07.801632Z] [debug] [pid:31496] Waiting for video encoder to finalize the video
[2025-01-28T14:08:07.801689Z] [debug] [pid:31496] The external video encoder (pid 31623) terminated
[2025-01-28T14:08:07.801741Z] [debug] [pid:31496] The built-in video encoder (pid 31624) terminated
[2025-01-28T14:08:07.802612Z] [debug] [pid:31496] QEMU: QEMU emulator version 8.2.7 (SUSE Linux Enterprise 15)
[2025-01-28T14:08:07.802671Z] [debug] [pid:31496] QEMU: Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
[2025-01-28T14:08:07.802725Z] [warn] [pid:31496] !!! : qemu-system-x86_64: -chardev socket,id=chrtpm,path=/tmp/mytpm10/swtpm-sock: Failed to connect to '/tmp/mytpm10/swtpm-sock': Connection refused
[2025-01-28T14:08:07.803410Z] [debug] [pid:31496] sending magic and exit
[2025-01-28T14:08:07.803669Z] [debug] [pid:31338] received magic close
[2025-01-28T14:08:07.813407Z] [debug] [pid:31338] backend process exited: 0
[2025-01-28T14:08:07.914302Z] [warn] [pid:31338] !!! main: failed to start VM at /usr/lib/os-autoinst/backend/driver.pm line 104.
      backend::driver::start_vm(backend::driver=HASH(0x564ca399b078)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 18
      OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 109
      OpenQA::Isotovideo::Runner::create_backend(OpenQA::Isotovideo::Runner=HASH(0x564c9b382680)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 251
      OpenQA::Isotovideo::Runner::init(OpenQA::Isotovideo::Runner=HASH(0x564c9b382680)) called at /usr/bin/isotovideo line 183
      eval {...} called at /usr/bin/isotovideo line 178

Is the error /usr/bin/chattr: Operation not supported while setting flags on /var/lib/openqa/pool/19/raid , which is logged before, related?

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
openqa-query-for-job-label poo#176262

Acceptance criteria

  • AC1: Tests are no longer ending up incomplete due to QEMU not starting failing to connect to swtpm-sock

Suggestions

  • If possible, try to reproduce the issue locally
  • Search for any existing issues/tickets in progress and check for any possible workaround/solution
  • Check what makes these tests special (they are just x86_64 QEMU tests but probably use a specific feature) -> those are tests using the test setting QEMUTPM

Related issues 2 (2 open0 closed)

Related to openQA Project (public) - action #112226: [sporadic] swtpm socket not found anymore after loading from snapshot size:MWorkable2022-06-09

Actions
Related to openQA Tests (public) - action #115919: [security] test fails in tpm2_measured_bootBlocked2022-08-29

Actions
Actions #1

Updated by jlausuch 3 months ago

  • Related to action #112226: [sporadic] swtpm socket not found anymore after loading from snapshot size:M added
Actions #2

Updated by jlausuch 3 months ago

Seems to be happening in openqaworker16 and qesapworker-prg7.

Actions #3

Updated by jlausuch 3 months ago

Actions #4

Updated by okurz 3 months ago

  • Tags set to reactive work, osd, swtpm
  • Priority changed from High to Urgent
  • Target version set to Ready
Actions #5

Updated by robert.richardson 3 months ago

  • Assignee set to robert.richardson
Actions #6

Updated by robert.richardson 3 months ago

  • Status changed from New to In Progress
Actions #7

Updated by robert.richardson 3 months ago

  • Description updated (diff)
Actions #8

Updated by robert.richardson 3 months ago ยท Edited

The earliest occurrence i've found is from 13 days ago (Jan 16, 2025 at 08:01:42) on qesapworker-prg6
For dmesg: read kernel buffer failed: Operation not permitted i found the most often suggested is sudo sysctl kernel.dmesg_restrict=0.
Still unsure what the root cause is and/or why it is only showing up sporadically though.

Actions #9

Updated by robert.richardson 3 months ago

  • Subject changed from Failed to connect to swtpm-sock: Connection refused to SLE Micro images fail on boot with auto_review:"Failed to connect to '/tmp/mytpm15/swtpm-sock': Connection refused":retry
Actions #10

Updated by robert.richardson 3 months ago

  • Status changed from In Progress to Workable
  • Priority changed from Urgent to High

I'm reducing the priority to high as it does not affect multiple products and the auto_review script should serve as a temporary mitigation for most of the failures (jobs will automatically restarted and tagged/commented accordingly).
If anyone has specific ideas on how to proceed, please let me know. Maybe we can estimate this within the team ?

Actions #11

Updated by okurz 3 months ago

  • Status changed from Workable to New
Actions #12

Updated by tinita 3 months ago

  • Subject changed from SLE Micro images fail on boot with auto_review:"Failed to connect to '/tmp/mytpm15/swtpm-sock': Connection refused":retry to SLE Micro images tests incomplete on qemu start with auto_review:"Failed to connect to '/tmp/mytpm15/swtpm-sock': Connection refused":retry size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #13

Updated by jlausuch 3 months ago

will this auto-retry rule work? mytpm15 is hardcoded, it could be any number

Actions #14

Updated by robert.richardson 3 months ago

  • Subject changed from SLE Micro images tests incomplete on qemu start with auto_review:"Failed to connect to '/tmp/mytpm15/swtpm-sock': Connection refused":retry size:S to SLE Micro images tests incomplete on qemu start with auto_review:"swtpm-sock': Connection refused":retry size:S
Actions #15

Updated by robert.richardson 3 months ago

  • Description updated (diff)
Actions #16

Updated by okurz 3 months ago

  • Status changed from Workable to In Progress
Actions #17

Updated by robert.richardson 3 months ago

  • Status changed from In Progress to Feedback
  • Priority changed from High to Low

Yes thank you, i've adjusted the subject.

I've found this ticket which although it so far only occured on aarch64, may be related. I wasnt able to reproduce locally as i'm having some issues with isotovideo atm.
As discussed with @okurz i'm reducing the priority for now.

Actions #18

Updated by robert.richardson 3 months ago

  • Related to action #115919: [security] test fails in tpm2_measured_boot added
Actions #19

Updated by robert.richardson 2 months ago

  • Status changed from Feedback to In Progress
Actions #20

Updated by robert.richardson 2 months ago

There was a leftover file within a temp directory on one of the workers causing the error. This PR makes sure it always starts with a fresh directory which fixes the error and simplifies the code.

Actions #22

Updated by robert.richardson 2 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF