action #125033: [security][maint][12sp2][12sp3][12sp4] test fails in aa_autodep - openQA Tests - openSUSE Project Management Tool

Actions

Copy link

action #125033

closed

[security][maint][12sp2][12sp3][12sp4] test fails in aa_autodep

Added by pstivanin over 1 year ago. Updated over 1 year ago.

Status:

Resolved

Priority:

High

Assignee:

emiler

Category:

Bugs in existing tests

Target version:

Start date:

2023-02-24

Due date:

% Done:

100%

Estimated time:

Difficulty:

Tags:

fail

Description

Observation¶

openQA test in scenario sle-12-SP3-Server-DVD-Updates-x86_64-mau-apparmor@64bit fails in
aa_autodep

Test suite description¶

Testsuite maintained at https://gitlab.suse.de/qe-security/osd-sle15-security.

Reproducible¶

Fails since (at least) Build 20230223-1

Expected result¶

Last good: 20230222-1 (or more recent)

Further details¶

Always latest result in this scenario: latest

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by emiler over 1 year ago

This is weird, but I've re-run this in my local instance and it passed...
http://emiler-openqa.qe.suse.de/tests/95#
The error seems like it's just a timeout, so an infrastructure problem?

Actions

Copy link

Updated by emiler over 1 year ago

Status changed from New to In Progress
Assignee set to emiler

Actions

Copy link

Updated by pstivanin over 1 year ago

happened also on 12sp2: https://openqa.suse.de/tests/10603080

Actions

Copy link

Updated by pstivanin over 1 year ago

new failure on 12sp3: https://openqa.suse.de/tests/10608748

Actions

Copy link

Updated by emiler over 1 year ago

% Done changed from 0 to 80

Should be fixed by this PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16525

Actions

Copy link

Updated by emiler over 1 year ago

I am also experimenting with setting TIMEOUT_SCALE instead.

Actions

Copy link

Updated by emiler over 1 year ago

% Done changed from 80 to 100

I've spoken with Josef Pupava and he says that TIMEOUT_SCALE should never be used in production, but only while debugging (when you don't want to deal with several timeouts by hand). It is also ignored by some timeouts entirely, no all cases honour this variable. So the original PR is a much better solution, in his own words.
PR merged, waiting for a successful run before closing.

Actions

Copy link

Updated by dzedro over 1 year ago

Yep, I'm not fan of use of TIMEOUT_SCALE as the solution.
Exceptions are slower architecture like aarch64, where it's faster and convenient.
But timeouts are, will be always, causing failures because since beginning the timeouts were pretty strict.
Sometimes things are slower or get slower because some functionality got extended, worker has load peek, infra has hickup etc.
Timeout is different in e.g. assert_script_run and assert_screen or checks_screen.
IMO in assert_script_run should be as high as possible because it's always better to get error message from cmd than timeout. Timeout is some king of safe net if something abnormal would happen to avoid stuck assert_script_run like infinite loop.
Needle timeouts should be as low as possible, but not too low. With assert_screen if needles does not match in time or is not present test will fail. With check_screen will wait whole timeout and return match but not fail. There is different usage for both.
TIMEOUT_SCALE will just multiply "all" this timeouts with different behavior. 🤷

Actions

Copy link

Updated by emiler over 1 year ago

Status changed from In Progress to Resolved

https://openqa.suse.de/tests/10611337
Re-run on 12-SP3 passed this time. Closing.

Actions

Copy link

#10

Updated by pstivanin over 1 year ago

Status changed from Resolved to Feedback

still failing: https://openqa.suse.de/tests/10614289

Actions

Copy link

#11

Updated by emiler over 1 year ago

https://openqa.suse.de/tests/10615012
Ok, weird. A re-run of the same test passed again, so the timeout is perhaps still not enough? I don't want to believe that this will hang for over 5 minutes.

Actions

Copy link

#12

Updated by pstivanin over 1 year ago

I think it'd be better to set the RETRY value to 3 in this case (via test suite json).

Actions

Copy link

#13

Updated by emiler over 1 year ago

That could work.
Related PR: https://gitlab.suse.de/qe-security/osd-sle15-security/-/merge_requests/57
Test run passed: http://emiler-openqa.qe.suse.de/tests/120

Actions

Copy link

#14

Updated by emiler over 1 year ago

Status changed from Feedback to Resolved

Tests passed several times now:

Closing again.

Actions

Copy link

#15

Updated by mgrifalconi over 1 year ago

Hello, test are still failing: https://openqa.suse.de/tests/10689040#next_previous

Actions

Copy link

#16

Updated by tjyrinki_suse over 1 year ago

Status changed from Resolved to Workable

Reopening, it has failed 7 out of 10 times recently: https://openqa.suse.de/tests/10690470#next_previous , which is a bit high / cumbersome.

Actions

Copy link

#17

Updated by emiler over 1 year ago

Status changed from Workable to In Progress

Actions

Copy link

#18

Updated by emiler over 1 year ago

We can either increase the timeout, set a timeout multipler in the testsuite itself (i'd rather not), or, as Paolo suggested:

maybe we can try with more resources? qemucpu=host, qemcpus=2, qemuram=2048 ?

I'll take a look at this on Thursday.

Actions

Copy link

#19

Updated by emiler over 1 year ago

Subject changed from [security][maint][12sp3][12sp4] test fails in aa_autodep to [security][maint][12sp2][12sp3][12sp4] test fails in aa_autodep

Actions

Copy link

#20

Updated by emiler over 1 year ago

New PR adding more resources to the test run: https://gitlab.suse.de/qe-security/osd-sle15-security/-/merge_requests/73
I'll wait for new schedules, to see if it fails again, before closing.

Actions

Copy link

#21

Updated by emiler over 1 year ago

All three versions passed today:

Though I am still going to wait until Monday to double-check.

Actions

Copy link

#22

Updated by emiler over 1 year ago

Status changed from In Progress to Resolved

Still passing:

Closing.

Actions

Copy link

#23

Updated by emiler about 1 year ago

Related to action #131462: [security][12-sp2] test fails in aa_autodep added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA » openQA Project » openQA Tests

Tags

Custom queries

action #125033

[security][maint][12sp2][12sp3][12sp4] test fails in aa_autodep

Observation¶

Test suite description¶

Reproducible¶

Expected result¶

Further details¶

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by pstivanin over 1 year ago

Updated by pstivanin over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by dzedro over 1 year ago

Updated by emiler over 1 year ago

Updated by pstivanin over 1 year ago

Updated by emiler over 1 year ago

Updated by pstivanin over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by mgrifalconi over 1 year ago

Updated by tjyrinki_suse over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler over 1 year ago

Updated by emiler about 1 year ago