Project

General

Profile

Actions

action #68510

closed

test fails in logpackages on some aarch64 workers

Added by ggardet_arm almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2020-06-29
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-15.2-DVD-aarch64-create_hdd_gnome-x11@aarch64 fails in
logpackages

Since few days, installation test from ISO fail in logpackages on few aarch64 workers (AWS, based on SLE15SP1, and honeycomb lx2k, based on Tumbleweed) for both Leap 15.2 and Tumbleweed.

Test suite description

Reproducible

Fails since (at least) Build 208.3 (current job)

Expected result

Last good: 208.3 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Related to openQA Project - action #68474: similarity calculation can return NaNResolvednadvornik2020-06-26

Actions
Actions #1

Updated by ggardet_arm almost 4 years ago

  • Subject changed from test fails in logpackages on AWS worker only to test fails in logpackages on some aarch64 workers
  • Description updated (diff)

It now occurs on my honeycomb lx2k machine, based on Tumbleweed: https://openqa.opensuse.org/tests/1320555#step/logpackages/4

Actions #2

Updated by okurz almost 4 years ago

regarding https://progress.opensuse.org/issues/68510 what the test modules should do is send "ctrl-alt-f" and the screen should show a console. https://openqa.opensuse.org/tests/1320555/file/autoinst-log.txt shows that there is

[2020-06-30T08:44:06.213 UTC] [debug] tests/installation/logpackages.pm:31 called testapi::select_console
[2020-06-30T08:44:06.214 UTC] [debug] <<< testapi::select_console(testapi_console="install-shell")

Instead like in
https://openqa.opensuse.org/tests/1320555#step/logpackages/2
we see that there is no apparent screen change. Another tty key press should be done in the post_fail_hook but there as well no effect:
https://openqa.opensuse.org/tests/1320555#step/logpackages/7

https://openqa.opensuse.org/tests/1320555#investigation shows that there are really no relevant test changes between "last good" and "first bad" but that of course is comparing worker "hctw01:10" against "openqa-aarch16:15". You could compare if there are relevant changes, e.g. installed package updates, on the worker host where you encounter these problems. Maybe a kernel upgrade or qemu? What you can also try is to trigger a job explicitly on the affected worker hosts and use the developer mode to pause at the start of "logpackages" and interactively try if the tty switch is effective.

Actions #3

Updated by ggardet_arm almost 4 years ago

Unfortunately, developer mode does not work as they are remote workers, without incoming connection permitted.

  • aws worker (broken):

    • OS: SLE15SP1
    • os-autoinst: 4.6.1592908950.5038d8c2-458.1 (was working with 4.6.1592836702.7b31205c, only diff is https://github.com/os-autoinst/os-autoinst/pull/1449 so unlikely the cause)
    • openQA-worker: 4.6.1593437825.61ee7d716-2920.1
    • qemu: 3.1.1.1-9.21.4
    • qemu-uefi-aarch64: 202002-3.2
  • HoneyComb LX2K worker (broken):

    • OS: Tumbleweed
    • os-autoinst: 4.6.1592908950.5038d8c2-1.1
    • openQA-worker: 4.6.1592865598.88eb8f02c-1.1
    • qemu: 5.0.0-3.1
    • qemu-uefi-aarch64: 202005-1.1
  • D05 (working):

    • OS: Leap 15.1
    • os-autoinst: 4.6.1592908950.5038d8c2-1.1
    • openQA-worker: ?
    • qemu: 3.1.1.1
    • qemu-uefi-aarch64: ?
Actions #4

Updated by ggardet_arm almost 4 years ago

I managed to VNC to qemu (on the HoneyComb LX2K, as it is a local machine for me) and the tty switch actually happens, but openQA still see the old screen!

So we have a screen refresh problem apparently!

Actions #5

Updated by mkittler almost 4 years ago

A screen refreshing problem has actually recently been reported. There's also a workaround: https://github.com/os-autoinst/os-autoinst/pull/1451

However, it seemed specific to SLE and not clearly related.

Actions #6

Updated by ggardet_arm almost 4 years ago

  • Related to action #68474: similarity calculation can return NaN added
Actions #7

Updated by ggardet_arm almost 4 years ago

  • Status changed from New to Resolved

This PR fixes the problem for the SLE15SP1 based AWS system and also for the Tumbleweed based machine, displaying WARNING: cv::norm() returned NaN (poo#68474) in the log for both.

Actions

Also available in: Atom PDF