action #107470: [openqa][ipmi][worker][sut][needle matching] 'sshd-server-started' needle matching has been continuously failing on some workers/SUTs size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

#1

Updated by waynechen55 almost 3 years ago

Steps to reproduce:¶

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
The automated host upgrade procedure is explained as below:

Install host as base product sles12sp5 with MainUpdate.Do registration during installation.

Perform offline upgrade automatically by adding the following into grub config

menuentry SLE-15-SP4-Full-x86_64-Buildxxxxx-Media1-012422075306 { 
    insmod gzio 
    insmod part_msdos 
    insmod btrfs 
    search --no-floppy --fs-uuid --set=root c911bf44-435b-4b62-a856-e5a0fcc20e8e 
    linux /boot/loader-qloTWw/linux autoupgrade=1 console=ttyS1,115200 console=tty vga=791 Y2DEBUG=1 xvideo=1024x768 ssh=1 sshpassword=nots3cr3t install=http://openqa.suse.de/assets/repo/SLE-15-SP4-Full-x86_64-Buildxxxxx-Media1 
    initrd /boot/loader-qloTWw/initrd 
}

Boot into above grub entry and wait for ssh daemon up and running > * ssh to the host and run yast.ssh to perform automatic offline upgrade

And here is what a successful 'sshd-server-started' needle matching looks like.

Actions

Copy link

#2

Updated by okurz almost 3 years ago

Category set to Support
Priority changed from Normal to High
Target version set to Ready

Actions

Copy link

#3

Updated by livdywan almost 3 years ago

Assignee set to livdywan

My intuition is, maybe the needle matching could be replaced with another approach since the output on the console shifts around a lot. I'll talk to Wayne.

Actions

Copy link

#4

Updated by livdywan almost 3 years ago

Subject changed from [openqa][ipmi][worker][sut][needle matching] 'sshd-server-started' needle matching has been continously failing on some workers/SUTs to [openqa][ipmi][worker][sut][needle matching] 'sshd-server-started' needle matching has been continously failing on some workers/SUTs size:M
Status changed from New to Workable

Actions

Copy link

#5

Updated by waynechen55 almost 3 years ago

Category deleted (~~Support~~)
Assignee deleted (~~livdywan~~)
Target version deleted (~~Ready~~)

cdywan wrote:

My intuition is, maybe the needle matching could be replaced with another approach since the output on the console shifts around a lot. I'll talk to Wayne.

So what is the alternative way to do needle matching ? Interesting to know.

Actions

Copy link

#6

Updated by waynechen55 almost 3 years ago

Subject changed from [openqa][ipmi][worker][sut][needle matching] 'sshd-server-started' needle matching has been continously failing on some workers/SUTs size:M to [openqa][ipmi][worker][sut][needle matching] 'sshd-server-started' needle matching has been continuously failing on some workers/SUTs size:M
Category set to Support
Assignee set to livdywan
Target version set to Ready

Actions

Copy link

#7

Updated by waynechen55 almost 3 years ago

@cdywan May I know any update on this issue ? Are you going to fix this ?

Actions

Copy link

#8

Updated by livdywan almost 3 years ago

Status changed from Workable to Feedback

Progress is barely usable, but I'll try to reflct what's being discussed in Slack.

waynechen55 wrote:

@cdywan May I know any update on this issue ? Are you going to fix this ?

Note that this is a "support" ticket, I'm not planning to take over the test and there's no bug here afair.

I noticed the test is waiting for sshd-server-started and also logging things like SSH connection to .* established. Since the needles involve wrapping and repeating output, maybe it's better to check this on the console rather than grapically?

@waynechen55 pointed out that installation and upgrade rely on sshd-server-started and the failure on boot_from_pxe is rare. And ssh should only be connected after we know the host is up.

Actions

Copy link

#9

Updated by waynechen55 almost 3 years ago

@cdywan I have some new findings with regard to this issue and those failed test runs. I found that those failed test runs had been always trying to match a inferior needle 'boot_from_pxe-sshd-server-started-20171030' instead of newly created ones that has the same tag 'sshd-server-started'. On the contrary, the successful test run matched up the newly created needle 'reboot_and_wait_up_upgrade-grub2-openqawoker2-20-20211231'. So my questions are:

Why test still tries to match up inferior needle with already existing newly captured needle ?
How to combat the issue and let test switch to detect newly created needle instead of old one ?

Actions

Copy link

#10

Updated by livdywan almost 3 years ago

waynechen55 wrote:

@cdywan I have some new findings with regard to this issue and those failed test runs. I found that those failed test runs had been always trying to match a inferior needle 'boot_from_pxe-sshd-server-started-20171030' instead of newly created ones that has the same tag 'sshd-server-started'. On the contrary, the successful test run matched up the newly created needle 'reboot_and_wait_up_upgrade-grub2-openqawoker2-20-20211231'. So my questions are:

Why test still tries to match up inferior needle with already existing newly captured needle ?

How to combat the issue and let test switch to detect newly created needle instead of old one ?

I don't think the timestamp in the filename or when the file was created would be taken into account here. All of them have the same tag and similar match regions.

How about using more distinct tags? Like sshd-via-yast.

Actions

Copy link

#11

Updated by waynechen55 almost 3 years ago

cdywan wrote:

waynechen55 wrote:

@cdywan I have some new findings with regard to this issue and those failed test runs. I found that those failed test runs had been always trying to match a inferior needle 'boot_from_pxe-sshd-server-started-20171030' instead of newly created ones that has the same tag 'sshd-server-started'. On the contrary, the successful test run matched up the newly created needle 'reboot_and_wait_up_upgrade-grub2-openqawoker2-20-20211231'. So my questions are:

Why test still tries to match up inferior needle with already existing newly captured needle ?

How to combat the issue and let test switch to detect newly created needle instead of old one ?

I don't think the timestamp in the filename or when the file was created would be taken into account here. All of them have the same tag and similar match regions.

How about using more distinct tags? Like sshd-via-yast.

So if there are multiple needles having the same tag and similar match regions, which one will be used for matching up ? May I know how the engine chooses the one to be used for matching up ? Thanks. @cdywan

In other words, I want to know how I can let the test chooses the recent needles instead of those in the distant past if they have the same tag.

Actions

Copy link

#12

Updated by waynechen55 almost 3 years ago

cdywan wrote:

waynechen55 wrote:

@cdywan I have some new findings with regard to this issue and those failed test runs. I found that those failed test runs had been always trying to match a inferior needle 'boot_from_pxe-sshd-server-started-20171030' instead of newly created ones that has the same tag 'sshd-server-started'. On the contrary, the successful test run matched up the newly created needle 'reboot_and_wait_up_upgrade-grub2-openqawoker2-20-20211231'. So my questions are:

Why test still tries to match up inferior needle with already existing newly captured needle ?

How to combat the issue and let test switch to detect newly created needle instead of old one ?

I don't think the timestamp in the filename or when the file was created would be taken into account here. All of them have the same tag and similar match regions.

How about using more distinct tags? Like sshd-via-yast.

About your suggestion "How about using more distinct tags? Like sshd-via-yast.":
This looks feasible at the first glimpse. But after giving it a second thought, I think the same issue may also happen to a new distinct tag. For example, if a test run fails to match a needle with the new distinct tag, then a new needle is captured and created. How can you guarantee that the test will start looking for the new needle ? It may still look for old ones, so the same problem comes back. Right ? I think it may worth a deep look into the openQA engine IMHO. @cdywan

Actions

Copy link

#13

Updated by okurz almost 3 years ago

Assignee changed from livdywan to okurz
Priority changed from High to Low

waynechen55 wrote:

@cdywan I have some new findings with regard to this issue and those failed test runs. I found that those failed test runs had been always trying to match a inferior needle 'boot_from_pxe-sshd-server-started-20171030' instead of newly created ones that has the same tag 'sshd-server-started'. On the contrary, the successful test run matched up the newly created needle 'reboot_and_wait_up_upgrade-grub2-openqawoker2-20-20211231'. So my questions are:

Why test still tries to match up inferior needle with already existing newly captured needle ?

How to combat the issue and let test switch to detect newly created needle instead of old one ?

Please remove older needles which are wrongly matching or are not as specific as newer needles that you created. That's the right approach.

Actions

Copy link

#14

Updated by okurz almost 3 years ago

After the workshop my primary suggestion is to replace the multiple check_screen calls in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/virt_autotest/login_console.pm#L91 and below with assert_screen. Also see https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/CONTRIBUTING.md?plain=1#L114 . In particular using a check_screen without checking the return code should be considered an error.

Actions

Copy link

#15

Updated by waynechen55 almost 3 years ago

okurz wrote:

After the workshop my primary suggestion is to replace the multiple check_screen calls in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/virt_autotest/login_console.pm#L91 and below with assert_screen. Also see https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/CONTRIBUTING.md?plain=1#L114 . In particular using a check_screen without checking the return code should be considered an error.

I think why people choose to use check_screen with match_has_tag is that it allows them to handle "failing to match" more tactically instead of just failing the whole test. Sometimes it is just not very necessary to fail the test because it is not a fatal checkpoint, and at the same time, different operations need to be done based on the specific needle matched up.

Actions

Copy link

#17

Updated by okurz over 2 years ago

Status changed from Feedback to Resolved

@waynechen55 I assume you managed to follow the suggestions and remove old, invalid needles. I assume this support task can be resolved.

Actions

Copy link

#18

Updated by waynechen55 over 2 years ago

okurz wrote:

@waynechen55 I assume you managed to follow the suggestions and remove old, invalid needles. I assume this support task can be resolved.

Agree. It is done.

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #107470

[openqa][ipmi][worker][sut][needle matching] 'sshd-server-started' needle matching has been continuously failing on some workers/SUTs size:M

Observation¶

Steps to reproduce¶

Problem¶

Suggestion¶

Workaround¶

Updated by waynechen55 almost 3 years ago

Steps to reproduce:¶

Updated by okurz almost 3 years ago

Updated by livdywan almost 3 years ago

Updated by livdywan almost 3 years ago

Updated by waynechen55 almost 3 years ago

Updated by waynechen55 almost 3 years ago

Updated by waynechen55 almost 3 years ago

Updated by livdywan almost 3 years ago

Updated by waynechen55 almost 3 years ago

Updated by livdywan almost 3 years ago

Updated by waynechen55 almost 3 years ago

Updated by waynechen55 almost 3 years ago

Updated by okurz almost 3 years ago

Updated by okurz almost 3 years ago

Updated by waynechen55 almost 3 years ago

Updated by okurz over 2 years ago

Updated by waynechen55 over 2 years ago