QA (public) » openQA Project (public) » openQA Tests (public)

openQA Project (public) - Done

Category:

Bugs in existing tests

Target version:

Start date:

2019-04-22

Due date:

% Done:

Estimated time:

5.00 h

Difficulty:

Description

Observation¶

openQA test in scenario opensuse-Tumbleweed-KDE-Live-x86_64-kde_live_upgrade_leap_42.3@64bit-2G fails in
await_install

In general we don't care much if logs were collected before the reboot or after, except that system might not boot.
So if we don't find a way to make it work, let's just boot. Consequence will be in case of failures which prevent system from booting we won't have logs. But that's only for the cases when YaST wasn't able to detect the issue.

So after discussion we decided to implement solution not to collect logs from SUT depending on some variable and then we don't need to catch reboot pop-up.
This will require un-scheduling logs_from_installation_system and modifying await_install not to wait for the pop-up.

Mentioned scenario is the single one affected.

Test suite description¶

Uses the live installer on the kde live media for upgrading the system.

Acceptance criteria¶

Test suite doesn't fail if we miss reboot screen

Reproducible¶

Fails since (at least) Build 20190421 (current job)

Expected result¶

Last good: 20190420 (or more recent)

Further details¶

Always latest result in this scenario: latest

Related issues 6 (0 open — 6 closed)

Related to openQA Tests (public) - action #53534: [opensuse][kde] test fails in await_install - timeout not working properly

Resolved

2019-06-26

Related to openQA Tests (public) - action #51983: [functional][y][sporadic] test fails in "await_install" to detect the end of installation

Rejected

riafarov

2019-05-25

Related to openQA Infrastructure (public) - action #58727: openqa-aarch64 from o3 slower than usual aka. os-autoinst is too slow pressing F2 causing ARM tests to fail in "boot_to_desktop"

Resolved

tinita

2019-10-28

Related to openQA Infrastructure (public) - action #20914: [tools] configure vm settings for workers with rotating discs

Resolved

2017-07-28

2019-11-05

Has duplicate openQA Tests (public) - action #58802: test fails in await_install

Rejected

2019-10-29

Has duplicate openQA Tests (public) - action #58832: test fails in await_install, seems to be stuck on grub menu

Rejected

2019-10-29

Updated by SLindoMansilla almost 6 years ago

Subject changed from test fails in await_install - does not catch rebootnow to [opensuse] test fails in await_install - does not catch rebootnow

As a result of backlog triaging (see https://progress.opensuse.org/projects/openqatests/wiki#ticket-backlog-triaging for more information).

Please, feel free to adjust the category or the "[label]" if you think different.

Actions

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: update_Leap_42.1_kde
https://openqa.opensuse.org/tests/935316

Actions

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: update_Leap_42.1_kde
https://openqa.opensuse.org/tests/945390

Actions

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: update_Leap_42.3_kde+system_performance
https://openqa.opensuse.org/tests/959088

Actions

Updated by okurz almost 6 years ago

Subject changed from [opensuse] test fails in await_install - does not catch rebootnow to [functional][y] test fails in await_install - does not catch rebootnow
Priority changed from Normal to High

This is happening seemingly more and more now, e.g. see https://openqa.opensuse.org/tests/overview?version=Tumbleweed&failed_modules=await_install showing already 5 jobs within a single build right now.

@riafarov I think QSF-y can handle this better. Could you please help? Seems to me as if the installer changed it's performance impact somewhat so that we have a more loaded system which is more prone to miss the screen? Or did something change in test behaviour? Or does the installer have an option by now to always stop at the end without timeout :) I guess as a workaround we could still try to accept the fact when we found a successfully booted system instead that we do not even need the installation logs from the next module.

Actions

Updated by okurz almost 6 years ago

Related to action #53534: [opensuse][kde] test fails in await_install - timeout not working properly added

Actions

Updated by okurz almost 6 years ago

Related to action #51983: [functional][y][sporadic] test fails in "await_install" to detect the end of installation added

Actions

Updated by okurz almost 6 years ago

latest occurence: https://openqa.opensuse.org/tests/975173#step/logs_from_installation_system/2

Actions

Updated by riafarov over 5 years ago

Target version set to Milestone 27

From the logs, there is no gap of 9 seconds, so screen should have matched, even while having 33 needle to match the screen. I will attempt to reduce number of needles, but it's issue with a tooling. In the logs we have evidence of the message being displayed

Actions

#10

Updated by riafarov over 5 years ago

Description updated (diff)
Due date set to 2019-09-24

Actions

#11

Updated by riafarov over 5 years ago

Priority changed from High to Normal
Target version changed from Milestone 27 to Milestone 28

Actions

#12

Updated by riafarov over 5 years ago

Description updated (diff)
Status changed from New to Workable
Estimated time set to 5.00 h

Actions

#13

Updated by riafarov over 5 years ago

Due date changed from 2019-09-24 to 2019-10-08
Assignee set to riafarov

Actions

#14

Updated by riafarov over 5 years ago

Status changed from Workable to Blocked

There is problem with the image, looks like we upgrade uefi installation using legacy boot which breaks.

Actions

#15

Updated by riafarov over 5 years ago

Due date changed from 2019-10-08 to 2019-10-22

Actions

#16

Updated by riafarov over 5 years ago

Target version changed from Milestone 28 to Milestone 30+

Actions

#17

Updated by riafarov over 5 years ago

Due date deleted (~~2019-10-22~~)
Target version changed from Milestone 30+ to future

Actions

#18

Updated by okurz over 5 years ago

Hi @riafarov , you last comment in #50615#note-14 indicates a temporary problem? https://openqa.opensuse.org/tests/1067386#step/await_install/5 shows a recent failure with same symptoms but for sure not related to UEFI upgrade or legacy boot. What do you think about my suggestion in #50615#note-5 to "to accept the fact when we found a successfully booted system instead that we do not even need the installation logs from the next module."? That should be even easier now as it is possible to dynamically change the test schedule from the test itself. However it might be better to explicitly record the "skipping" in each test module that comes before grub_test, i.e. "logs_from_installation_system" and "reboot_after_installation".

I still see as alternative what was discussed in https://bugzilla.suse.com/show_bug.cgi?id=1122493 , managed by the YaST development team: https://trello.com/c/CDedArHx , to have an option to have indefinite timeout at the end of the installation.

EDIT: Trying myself with a suggestion for the YaST installer: https://github.com/yast/yast-installation/pull/823

Actions

#19

Updated by ggardet_arm over 5 years ago

Related to action #58727: openqa-aarch64 from o3 slower than usual aka. os-autoinst is too slow pressing F2 causing ARM tests to fail in "boot_to_desktop" added

Actions

#20

Updated by riafarov over 5 years ago

Due date set to 2019-12-03
Status changed from Blocked to Workable
Assignee deleted (~~riafarov~~)

okurz wrote:

Hi @riafarov , you last comment in #50615#note-14 indicates a temporary problem? https://openqa.opensuse.org/tests/1067386#step/await_install/5 shows a recent failure with same symptoms but for sure not related to UEFI upgrade or legacy boot. What do you think about my suggestion in #50615#note-5 to "to accept the fact when we found a successfully booted system instead that we do not even need the installation logs from the next module."? That should be even easier now as it is possible to dynamically change the test schedule from the test itself. However it might be better to explicitly record the "skipping" in each test module that comes before grub_test, i.e. "logs_from_installation_system" and "reboot_after_installation".

I still see as alternative what was discussed in https://bugzilla.suse.com/show_bug.cgi?id=1122493 , managed by the YaST development team: https://trello.com/c/CDedArHx , to have an option to have indefinite timeout at the end of the installation.

EDIT: Trying myself with a suggestion for the YaST installer: https://github.com/yast/yast-installation/pull/823

Hi @okurz. I would not call a problem which is there for a month temporary. Have you checked the failure in the job mentioned here? Also, as being said there is no easy way out, as our tools cannot handle this scenarios properly, meaning are unreliable. For what you are suggesting, there is already variable called GRUB_TIMEOUT. Alternative would be to use startshell=1 boot parameter which provides console before the reboot and doesn't require sync on the pop-up.

As you are part of the tools team now, maybe you could take a look why we cannot match the pop-up which is there for 10 seconds?

The bug you are referring to is against SLE 15 SP1 and about general performance, so please, do not mix everything in the single issue.
As now we have some job where we can reproduce the job, it can be worked on.

Actions

#21

Updated by okurz over 5 years ago

Related to action #20914: [tools] configure vm settings for workers with rotating discs added

Actions

#22

Updated by okurz over 5 years ago

riafarov wrote:

Hi @okurz. I would not call a problem which is there for a month temporary.

Yes, for sure, that's my point. But you updated the ticket status to "Workable" so that's what I meant, thanks! :)

Have you checked the failure in the job mentioned here?

Yes, I have checked. Did I miss something?

Also, as being said there is no easy way out, as our tools cannot handle this scenarios properly, meaning are unreliable. For what you are suggesting, there is already variable called GRUB_TIMEOUT.

Of course, I know about the variable. What I meant with "successfully booted system" is when we reached the grub menu, not a booted Linux system.

Alternative would be to use startshell=1 boot parameter which provides console before the reboot and doesn't require sync on the pop-up.

Yes, we discussed this already. It might be a bit too different from a normal test flow though.

As you are part of the tools team now, maybe you could take a look why we cannot match the pop-up which is there for 10 seconds?

The reason is simple: Linux is not a realtime operating system and we can not guarantee that we are able to interact with a system within time. Also see #20914 for more details. It is unfortunate that we can not make it work even within 8s but all save alternatives would come with a severe slowdown which we can not take lightly.

The bug you are referring to is against SLE 15 SP1 and about general performance, so please, do not mix everything in the single issue.

You know just the same as I do that the installer in SLE15SP1 is hardly any different from Tumbleweed so I don't know why you don't see this connect. However I mentioned the bug because the proposed solution is in there: To give a possibility to not have a timeout at all. Maybe you have an idea how we could hot-patch a live system to change https://github.com/yast/yast-installation/blob/master/src/lib/installation/clients/inst_finish.rb#L155 within the installer? In the end, it should be all ruby code, not compiled C code, right?

As now we have some job where we can reproduce the job, it can be worked on.

Hm, I doubt we have a more reproducible problem. At least this one is back to "works often, not always".

Actions

#23

Updated by okurz over 5 years ago

Has duplicate action #58802: test fails in await_install added

Actions

#24

Updated by okurz over 5 years ago

Has duplicate action #58832: test fails in await_install, seems to be stuck on grub menu added

Actions

#25

Updated by riafarov over 5 years ago

Due date changed from 2019-12-03 to 2019-12-17

There was a change in the installer code to disable timeout in the live installer, so might be that we don't need this fix anymore.

Actions

#26

Updated by okurz over 5 years ago

Status changed from Workable to In Progress
Assignee set to okurz

correct. I am on it currently. The product change is right now pending in Tumbleweed staging where we should be able to test it out. According to riafarov older derived products do not seem to be affected, i.e. more stable so maybe the product change is good enough. As an alternative we could still follow the "startshell" approach in parallel.

Actions

#27

Updated by okurz over 5 years ago

Status changed from In Progress to Blocked

still waiting for staging :F to build a new medium. We need to wait for at least build 317 in https://build.opensuse.org/package/binaries/openSUSE:Factory:Staging:F/000product:openSUSE-dvd5-dvd-x86_64/images including the necessary product changes before we can test again.

Actions

-> https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9176

#28

Updated by okurz over 5 years ago

I resolved both https://bugzilla.suse.com/show_bug.cgi?id=1157476 and https://bugzilla.suse.com/show_bug.cgi?id=1122493 , now waiting for https://build.opensuse.org/project/show/openSUSE:Factory:Staging:E to be accepted for https://build.opensuse.org/request/show/751336. Afterwards we can apply the new linuxrc parameter "reboot_timeout=0" for all but older products, i.e. Tumbleweed.

EDIT: 2019-12-16: The according SRs for all Tumbleweed, Leap 15.2 and SLE15SP2 are accepted now, we can set reboot_timeout=0 for all tests on newer products:

openqa-clone-job --within-instance https://openqa.opensuse.org/tests/1113903 BUILD= _GROUP= CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#feature/install_timeout TEST=minimalx_no_reboot_timeout_okurz_poo50615

Created job #1114648: opensuse-Tumbleweed-DVD-x86_64-Build20191214-minimalx@64bit -> https://openqa.opensuse.org/t1114648

Actions

#29

Updated by okurz over 5 years ago

Due date deleted (~~2019-12-17~~)
Status changed from Blocked to Feedback
Target version changed from future to Current Sprint

Actions

#30

Updated by SLindoMansilla over 5 years ago

Also happening for SLE15-SP2: https://openqa.suse.de/tests/3722762#step/await_install/3

Actions