Project

General

Profile

action #73297

auto_review:"(?s)Running on openqa-aarch64:.*considering VNC stalled.*THERE IS NOTHING TO READ"

Added by okurz 6 months ago. Updated 5 months ago.

Status:
In Progress
Priority:
Low
Assignee:
Target version:
Start date:
2020-10-13
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.opensuse.org/tests/1429775/file/autoinst-log.txt shows

[2020-10-13T05:54:02.797 CEST] [debug] tests/kernel/boot_ltp.pm:41 called testapi::record_info
[2020-10-13T05:54:02.798 CEST] [debug] <<< testapi::record_info(title="INFO", output="normal boot or boot with params", result="ok")
[2020-10-13T05:54:02.804 CEST] [debug] tests/kernel/boot_ltp.pm:43 called opensusebasetest::wait_boot -> lib/opensusebasetest.pm:1063 called opensusebasetest::handle_grub -> lib/opensusebasetest.pm:854 called opensusebasetest::wait_grub -> lib/opensusebasetest.pm:635 called testapi::assert_screen
[2020-10-13T05:54:02.805 CEST] [debug] <<< testapi::assert_screen(mustmatch=[
  "bootloader-shim-import-prompt",
  "grub2",
  "inst-bootmenu"
], timeout=100)
[2020-10-13T05:54:05.482 CEST] [debug] WARNING: check_asserted_screen took 1.70 seconds for 66 candidate needles - make your needles more specific
[2020-10-13T05:54:05.483 CEST] [debug] no match: 199.0s, best candidate: bootloader-tw-salt_installer-20190111 (0.29)
[2020-10-13T05:54:05.504 CEST] [debug] no change: 197.3s
[2020-10-13T05:54:06.489 CEST] [debug] no change: 196.3s
[2020-10-13T05:54:07.241 CEST] [debug] considering VNC stalled, no update for 4.22 seconds
[2020-10-13T05:54:16.375 CEST] [debug] WARNING: check_asserted_screen took 4.42 seconds for 66 candidate needles - make your needles more specific
[2020-10-13T05:54:16.376 CEST] [debug] no match: 190.8s, best candidate: bootloader-tw-salt_installer-20190111 (0.29)
[2020-10-13T05:54:16.385 CEST] [debug] pointer type 1 0 640 480 -257
[2020-10-13T05:54:16.386 CEST] [debug] led state 0 1 1 -261
[2020-10-13T05:54:16.397 CEST] [debug] no change: 186.4s
[2020-10-13T05:54:17.380 CEST] [debug] no change: 185.4s
[2020-10-13T05:54:18.381 CEST] [debug] no change: 184.4s
[2020-10-13T05:54:19.383 CEST] [debug] no change: 183.4s
[2020-10-13T05:54:20.384 CEST] [debug] no change: 182.4s
[2020-10-13T05:54:20.388 CEST] [debug] considering VNC stalled, no update for 4.00 seconds
[2020-10-13T05:54:28.777 CEST] [debug] no change: 174.0s
[2020-10-13T05:54:28.777 CEST] [debug] pointer type 1 0 800 600 -257
[2020-10-13T05:54:28.777 CEST] [debug] led state 0 1 1 -261
[2020-10-13T05:54:28.782 CEST] [debug] no change: 174.0s
[2020-10-13T05:54:29.100 CEST] [debug] backend process exited: 0
[2020-10-13T05:54:29.101 CEST] [debug] THERE IS NOTHING TO READ 15 4 3
[2020-10-13T05:54:29.101 CEST] [debug] stopping command server 46712 because test execution ended

this seems to be a significant performance regression on openqa-aarch64 since #72079


Related issues

Related to openQA Infrastructure - action #72079: Upgrade o3 worker openqa-aarch64 to openSUSE Leap 15.2 (to use newer packages specifically needed for aarch64 and as precursor), also problem auto_review:"(?s)starting: /usr/bin/qemu-system-aarch64.*backend died: Migrate to file failed"In Progress2020-09-29

History

#1 Updated by okurz 6 months ago

  • Related to action #72079: Upgrade o3 worker openqa-aarch64 to openSUSE Leap 15.2 (to use newer packages specifically needed for aarch64 and as precursor), also problem auto_review:"(?s)starting: /usr/bin/qemu-system-aarch64.*backend died: Migrate to file failed" added

#2 Updated by okurz 6 months ago

  • Subject changed from auto_review:"Running on openqa-aarch64:.*considering VNC stalled.*THERE IS NOTHING TO READ" to auto_review:"(?s)Running on openqa-aarch64:.*considering VNC stalled.*THERE IS NOTHING TO READ"

#3 Updated by okurz 6 months ago

ggardet_arm I assume this issue is resolved with work you have done but to learn for the future I would appreciate if you can describe what exactly was the solution. Can you update here?

#4 Updated by ggardet_arm 6 months ago

okurz wrote:

ggardet_arm I assume this issue is resolved with work you have done but to learn for the future I would appreciate if you can describe what exactly was the solution. Can you update here?

This is not fixed, just happens with a lower rate. It looks like it is related to performances.

#5 Updated by okurz 6 months ago

  • Status changed from New to Feedback
  • Target version changed from Ready to future

ok, anything we could do? Or what do you plan as next step? As long as you are assigned I will keep the target version "future" to communicate that the SUSE QA Tools team is not actively working on this ticket.

#6 Updated by ggardet_arm 6 months ago

  • Assignee deleted (ggardet_arm)

okurz wrote:

ok, anything we could do? Or what do you plan as next step? As long as you are assigned I will keep the target version "future" to communicate that the SUSE QA Tools team is not actively working on this ticket.

tbh, I do not know what would be the next step here.

#7 Updated by okurz 6 months ago

  • Due date set to 2020-11-25
  • Assignee set to okurz
  • Priority changed from Normal to Low
  • Target version changed from future to Ready

ok, let's see: So do you agree that this is likely a performance regression since we upgraded to openSUSE Leap 15.2? Do we need another bugreport for qemu or any generic bug report? Should we rollback any changes? If there is nothing else then at least I will monitor the situation how often and when these issues happen and have an automatic retrigger of these cases. Maybe when it happens not that often and we retrigger automatically there is nothing else to be done.

#8 Updated by okurz 5 months ago

  • Due date deleted (2020-11-25)
  • Assignee changed from okurz to ggardet_arm
  • Target version changed from Ready to future

Using https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label and calling
openqa-query-for-job-label 73297
I get:

1475717|2020-11-16 16:50:23|done|incomplete|kubeadm|backend died: Virtio terminal and svirt serial terminal do not support send_key. Use|openqa-aarch64
1469631|2020-11-13 08:13:30|done|incomplete|kubeadm|backend died: Virtio terminal and svirt serial terminal do not support send_key. Use|openqa-aarch64
1468939|2020-11-12 23:39:24|done|incomplete|upgrade_Leap_15.0_kde|backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 267.|openqa-aarch64
1469001|2020-11-12 23:38:53|done|incomplete|upgrade_Leap_15.2_gnome|backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 267.|openqa-aarch64
1467645|2020-11-12 01:17:58|done|incomplete|kubeadm|backend died: Virtio terminal and svirt serial terminal do not support send_key. Use|openqa-aarch64
1465574|2020-11-09 18:06:06|done|incomplete|kubeadm|backend died: Virtio terminal and svirt serial terminal do not support send_key. Use|openqa-aarch64
1465552|2020-11-09 16:51:24|done|incomplete|kubeadm|backend died: Virtio terminal and svirt serial terminal do not support send_key. Use|openqa-aarch64
1464795|2020-11-09 14:20:27|done|incomplete|microos|backend died: Virtio terminal and svirt serial terminal do not support send_key. Use|openqa-aarch64
1464792|2020-11-09 14:05:00|done|incomplete|kubeadm|backend died: Virtio terminal and svirt serial terminal do not support send_key. Use|openqa-aarch64
1465434|2020-11-09 13:30:05|done|incomplete|kubeadm|backend died: Virtio terminal and svirt serial terminal do not support send_key. Use|openqa-aarch64

so this is very much a recurring issue.

ggardet_arm I think a "rollback" of the migration comes a bit late by now. This seems to be a clear performance regression for me but I would not now what is wrong with the code that we maintain. Unless you already have according bug reports for that please make sure to report them for whatever you consider the relevant component. I trust that you are much more of an expert regarding the aarch64 specifics here.

#9 Updated by ggardet_arm 5 months ago

All the backend died: Virtio terminal and svirt serial terminal do not support send_key. are due to missing select_console at the beginning of the journal_check test. Should be fixed by: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11439
The Backend died: Migrate to file failed, it has been running for more than 240 seconds appears from time to time and are clearly related to performance. (Likely occurs when multiple VM try to migrate to file at the same time).

#10 Updated by ggardet_arm 5 months ago

  • Status changed from Feedback to In Progress

Also available in: Atom PDF