Project

General

Profile

Actions

action #107470

closed

[openqa][ipmi][worker][sut][needle matching] 'sshd-server-started' needle matching has been continuously failing on some workers/SUTs size:M

Added by waynechen55 almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Support
Target version:
Start date:
2022-02-24
Due date:
% Done:

0%

Estimated time:

Description

Observation

QE Virtualization has a openQA test suite prj2_host_upgrade_sles12sp5_to_developing_xen which automates host upgrade procedure from SLES 12-SP5 Xen host to SLES 15-SP4 Xen host. Needle matching has been continuously failing at reboot_and_wait_up_upgrade step as below:

# Test died: no candidate needle with tag(s) 'sshd-server-started' matched

Actually I has been keeping creating new 'sshd-server-started' needle after each failure. Unfortunately, the same failure still happened every time at the same step when the test was triggered by a new released daily build.

openqaworker-2:18/gonzo-1:
prj2_host_upgrade_sles12sp5_to_developing_xen Build101.1
prj2_host_upgrade_sles12sp5_to_developing_xen Build99.1
prj2_host_upgrade_sles12sp5_to_developing_xen Build98.1

openqaworker-2:19/fozzie-1:
prj2_host_upgrade_sles12sp5_to_developing_xen Build99.1
prj2_host_upgrade_sles12sp5_to_developing_xen Build98.1
prj2_host_upgrade_sles12sp5_to_developing_xen Build97.1
prj2_host_upgrade_sles12sp5_to_developing_xen Build91.2

Steps to reproduce

  • Trigger a openQA test run with a new daily build and ensure the test is assigned to openqaworker-2:18 or openqaworker-2:19. For example, openqa-client --host xxxxx isos post BUILD=xxxxx DISTRI=sle VERSION=15-SP4 FLAVOR=Online ARCH=x86_64 TEST=prj2_host_upgrade_sles12sp5_to_developing_xen
  • The automated host upgrade procedure is explained as below: > * Install host as base product sles12sp5 with MainUpdate.Do registration during installation. > * Perform offline upgrade automatically by adding the following into grub config menuentry SLE-15-SP4-Full-x86_64-Buildxxxxx-Media1-012422075306 { insmod gzio insmod part_msdos insmod btrfs search --no-floppy --fs-uuid --set=root c911bf44-435b-4b62-a856-e5a0fcc20e8e linux /boot/loader-qloTWw/linux autoupgrade=1 console=ttyS1,115200 console=tty vga=791 Y2DEBUG=1 xvideo=1024x768 ssh=1 sshpassword=nots3cr3t install=http://openqa.suse.de/assets/repo/SLE-15-SP4-Full-x86_64-Buildxxxxx-Media1 initrd /boot/loader-qloTWw/initrd } > * Boot into above grub entry and wait for ssh daemon up and running > * ssh to the host and run yast.ssh to perform automatic offline upgrade

Problem

  • Initially I think this might be caused by usb-storage driver loading which can be seen here. I had ever did some experiments of disabling usb-storage driver (passing borkenmodules=usb-storage to kernel) which gave me the feeling that 'sshd-server-started' needle hit rate can be increased. But it is hard to explain and does not make any sense to others. And I do not think usb-storage driver changes in every new daily build. So if a new 'sshd-server-started' needle is captured, it should be matched up afterwards.
  • It is more realistic to approach the issue from openQA engine perspective.
  • It seems that there is also another progress ticket poo#106056 that is related to ipmi backend issue. I do not think these two correlate directly except that the issue in this ticket depends on ipmi backend.

Suggestion

  • Check needle matching criteria and mechanism
  • Fix the issue from openQA engine perspective

Workaround

Capture needle and retrigger the job

Actions

Also available in: Atom PDF