Project

General

Profile

Actions

action #50237

open

[sle] await_install - longer timeout when MAX_JOB_TIME defined NOT ONLY for aarch64

Added by whdu about 5 years ago. Updated over 2 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Spike/Research
Target version:
-
Start date:
2019-04-10
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Now the code in installation/await_install.pm is:

# aarch64 can be particularily slow depending on the hardware
$timeout *= 2 if check_var('ARCH', 'aarch64') && get_var('MAX_JOB_TIME');

I recommend that it should applied to all conditions not limited to aarch64, because the slowness happened in a lot of situations (eg. prepare image registered with proxy SCC in local development environment)

What is your opinion? I need some input.

Actions #1

Updated by hjluo about 5 years ago

that's a good idea for all platforms.

Actions #2

Updated by SLindoMansilla about 5 years ago

  • Category set to Spike/Research

I would need to see links that supports that hypothesis.
The timeout is there for a reason, which is discovering bugs whose effect is slowing down the SUT.

There are also SCC related bugs that causes a timeout: https://bugzilla.suse.com/show_bug.cgi?id=1123963
There could be cases were increasing the timeout can make more sense like on encrypted scenarios, but in general, any case of increasing timeout needs consensus with different teams and release managers to confirm that it is not caused by a bug.
And, if it is a bug, a workaround needs to be implemented, which requires a bug ticket and the use of record_soft_fail.

Actions #3

Updated by whdu about 5 years ago

  • Subject changed from [sle][security][sle15sp1] await_install - longer timeout when MAX_JOB_TIME defined NOT ONLY for aarch64 to [sle] await_install - longer timeout when MAX_JOB_TIME defined NOT ONLY for aarch64
  • Assignee deleted (whdu)
Actions #4

Updated by whdu about 5 years ago

SLindoMansilla wrote:

...
There could be cases were increasing the timeout can make more sense like on encrypted scenarios, but in general, any case of increasing timeout needs consensus with different teams and release managers to confirm that it is not caused by a bug.
...

Yes, so I think we should get more inputs before making this change.

Actions #5

Updated by riafarov about 5 years ago

Did we get recent system performance degradation? From my perspective it's fine to bump timeout, but in case it became an issue we should investigate why it happened, as worked before for quite a while.

Actions #6

Updated by whdu about 5 years ago

riafarov wrote:

Did we get recent system performance degradation? From my perspective it's fine to bump timeout, but in case it became an issue we should investigate why it happened, as worked before for quite a while.

As I described, it happened on a slow network environment. Especially when preparing image with my own openQA instance for development purpose, and registered via proxy SCC (it means the system will get packages from openqa.suse.de/assets/)

Actions #7

Updated by okurz over 2 years ago

  • Priority changed from Normal to Low

This ticket was set to "Normal" priority but was not updated within 730 days which is 2 times the period of the SLO for "Normal" tickets (365 days) as described on https://progress.opensuse.org/projects/openqatests/wiki/Wiki#SLOs-service-level-objectives . The ticket will be set to the next lower priority of "Low".

Actions

Also available in: Atom PDF