Project

General

Profile

action #45938

[functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Jan

Added by RBrownSUSE about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Infrastructure
Target version:
SUSE QA tests - Milestone 22
Start date:
2019-01-10
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

Since 09th Jan the disk_boot test on Kubic is failing, but only during it's textmode scenario

https://openqa.opensuse.org/tests/828121

This test should be rebooting the SUT, but that reboot does not appear to occur

Other Kubic scenarios that call the disk_boot test but do not use Textmode are ok

The product has been confirmed manually that it boots perfectly fine, even when using textmode during the installation.

This is strongly suspected to be an openQA issue because tests with earlier TW builds that passed before 0109 (eg https://openqa.opensuse.org/tests/826720 ) now fail when run after the deploy that occurred on 0109 (eg https://openqa.opensuse.org/tests/828139)

I honestly have NO idea how the deploy can cause this symptoms, but in the absence of any recent commit that affects the use codepath (https://github.com/os-autoinst/os-autoinst-distri-opensuse/commits/master) I think the changes introduced in the deploy to o3 are the most likely cause.

Problem

  • H1 REJECT The product has changed

    • H1.1 REJECT Specified assets have changed -> see #45938#note-1, failure reproduced with the ISO from "last good"
    • H1.2 REJECT Downloaded, dynamic content is differing and impacting the test
  • H2 ACCEPT Fails because of changes in test setup

    • H2.1 REJECT Our test hardware equipment behaves different
    • H2.2 REJECT The network behaves different
    • H2.3 ACCEPT The worker has an influence
  • H3 REJECT Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA

  • H4 REJECT Fails because of changes in test management configuration, e.g. openQA database settings

  • H5 REJECT Fails because of changes in the test software itself (the test plan in source code as well as needles) -> E2.3-1

  • H6 ACCEPT Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time

    • H6.1 ACCEPT The test code is not resilient enough to slow reboots because of the 30s timeout waiting for the grub screen in kubic/disk_boot.pm:20

History

#1 Updated by RBrownSUSE about 2 years ago

  • Description updated (diff)

#2 Updated by okurz about 2 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz

#3 Updated by okurz about 2 years ago

  • Subject changed from disk_boot test in Kubic textmode scenarios fail after 09th Jan to [functional][u] disk_boot test in Kubic textmode scenarios fail after 09th Jan
  • Description updated (diff)
  • Priority changed from Normal to High
  • Target version set to Milestone 22

Let me see.

last good on openqaworker1: https://openqa.opensuse.org/tests/820899 from 16 days ago.

  • E4-1 Compare job settings: O4-1:
$ diff <(openqa_client_o3 --json-output jobs/826720 | jq '.job | .settings' | sort) <(openqa_client_o3 --json-output jobs/828139 | jq '.job | .settings' | sort)
18,19c18,19
<   "MULTI_STEP_KUBIC_FLOW": "1",
<   "NAME": "00826720-kubic-Tumbleweed-DVD-x86_64-Build20190105-microos_textmode@64bit-4G-HD40G",
---
>   "MULTI_STEP_KUBIC_FLOW": "1"
>   "NAME": "00828139-kubic-Tumbleweed-DVD-x86_64-Build20190105-microos_textmode@64bit-4G-HD40G",
31c31
<   "TEST": "microos_textmode"
---
>   "TEST": "microos_textmode",

no significant difference -> REJECT H4

  • E3-1 Check difference of os-autoinst: O3-1
$ git log1 --no-merges a7be7efa..cb3fa727
036ab540 Add missing network_console.pm to Makefile
631d0f7a Do not incomplete on connection error with ssh based consoles

so unlikely

$ for i in power8 aarch64 imagetester openqaworker1 openqaworker4 ; do echo -n "$i: " && ssh root@$i "cat /proc/loadavg"; done
power8: 8.19 8.37 7.64 8/1424 152970
aarch64: 1.22 1.35 1.27 2/866 24624
imagetester: 6.64 4.74 3.06 5/442 24515
openqaworker1: 10.80 12.80 10.53 17/1458 5511
openqaworker4: 8.19 10.86 9.71 6/1126 24461
$ for i in power8 aarch64 imagetester openqaworker1 openqaworker4 ; do echo -n "$i: " && ssh root@$i "cat /proc/loadavg"; done
power8: 8.35 8.63 7.96 8/1405 153270
aarch64: 0.45 1.00 1.15 1/806 24713
imagetester: 1.26 2.85 2.72 1/419 24721
openqaworker1: 6.49 9.38 9.72 8/1218 6257
openqaworker4: 10.08 9.82 9.50 11/1192 25258

The long-term load on imagetester is 2.72-3.06 whereas on openqaworker1/4 it is 9.50-10.53. So the load is significantly higher on openqaworker1 and openqaworker4 than imagetester. REJECT H2.1 and H2.2

for i in {001..020}; do openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org --skip-download --skip-chained-deps 828316 TEST=okurz_poo45938_$i _GROUP="Development Tumbleweed" BUILD="20190105:poo45938" EXCLUDE_MODULES=networking,repositories,create_autoyast,libzypp_config,one_line_checks,services_enabled,filesystem_ro,transactional_update,rebootmgr,journal_check,shutdown; done

-> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=20190105%3Apoo45938&groupid=38&distri=kubic

17/20 failed. The three passed ones all were executed on imagetester with the total execution time 8:30m, 8:32m, 10:06m (winter grub theme in the last which takes longer in "bootloader"). All failed jobs are in the time range of 8:57m-10:37m so longer. ACCEPT H2.3 and H2. As all three workers have the same package state ensured by transactional updates, REJECT H3

Fix in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6521

#4 Updated by okurz about 2 years ago

  • Description updated (diff)

#5 Updated by okurz about 2 years ago

  • Description updated (diff)
  • Status changed from In Progress to Feedback

#6 Updated by okurz about 2 years ago

PR merged.

Crosschecking:

$ for i in {001..020}; do openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org --skip-download --skip-chained-deps 828316 TEST=okurz_poo45938_scaled_3_$i _GROUP="Development Tumbleweed" BUILD="20190105:poo45938_with_fix" EXCLUDE_MODULES=networking,repositories,create_autoyast,libzypp_config,one_line_checks,services_enabled,filesystem_ro,transactional_update,rebootmgr,journal_check,shutdown; done

-> https://openqa.opensuse.org/tests/overview?distri=kubic&version=Tumbleweed&build=20190105%3Apoo45938_with_fix&groupid=38

#7 Updated by okurz about 2 years ago

  • Status changed from Feedback to Resolved

20/20 passed.

Also available in: Atom PDF