Project

General

Profile

Actions

action #125033

closed

[security][maint][12sp2][12sp3][12sp4] test fails in aa_autodep

Added by pstivanin about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2023-02-24
Due date:
% Done:

100%

Estimated time:
Difficulty:
Tags:

Description

Observation

openQA test in scenario sle-12-SP3-Server-DVD-Updates-x86_64-mau-apparmor@64bit fails in
aa_autodep

Test suite description

Testsuite maintained at https://gitlab.suse.de/qe-security/osd-sle15-security.

Reproducible

Fails since (at least) Build 20230223-1

Expected result

Last good: 20230222-1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #131462: [security][12-sp2] test fails in aa_autodepResolvedemiler2023-06-27

Actions
Actions #1

Updated by emiler about 1 year ago

This is weird, but I've re-run this in my local instance and it passed...
http://emiler-openqa.qe.suse.de/tests/95#
The error seems like it's just a timeout, so an infrastructure problem?

Actions #2

Updated by emiler about 1 year ago

  • Status changed from New to In Progress
  • Assignee set to emiler
Actions #3

Updated by pstivanin about 1 year ago

happened also on 12sp2: https://openqa.suse.de/tests/10603080

Actions #4

Updated by pstivanin about 1 year ago

Actions #5

Updated by emiler about 1 year ago

  • % Done changed from 0 to 80
Actions #6

Updated by emiler about 1 year ago

I am also experimenting with setting TIMEOUT_SCALE instead.

Actions #7

Updated by emiler about 1 year ago

  • % Done changed from 80 to 100

I've spoken with Josef Pupava and he says that TIMEOUT_SCALE should never be used in production, but only while debugging (when you don't want to deal with several timeouts by hand). It is also ignored by some timeouts entirely, no all cases honour this variable. So the original PR is a much better solution, in his own words.
PR merged, waiting for a successful run before closing.

Actions #8

Updated by dzedro about 1 year ago

Yep, I'm not fan of use of TIMEOUT_SCALE as the solution.
Exceptions are slower architecture like aarch64, where it's faster and convenient.
But timeouts are, will be always, causing failures because since beginning the timeouts were pretty strict.
Sometimes things are slower or get slower because some functionality got extended, worker has load peek, infra has hickup etc.
Timeout is different in e.g. assert_script_run and assert_screen or checks_screen.
IMO in assert_script_run should be as high as possible because it's always better to get error message from cmd than timeout. Timeout is some king of safe net if something abnormal would happen to avoid stuck assert_script_run like infinite loop.
Needle timeouts should be as low as possible, but not too low. With assert_screen if needles does not match in time or is not present test will fail. With check_screen will wait whole timeout and return match but not fail. There is different usage for both.
TIMEOUT_SCALE will just multiply "all" this timeouts with different behavior. 🤷

Actions #9

Updated by emiler about 1 year ago

  • Status changed from In Progress to Resolved

https://openqa.suse.de/tests/10611337
Re-run on 12-SP3 passed this time. Closing.

Actions #10

Updated by pstivanin about 1 year ago

  • Status changed from Resolved to Feedback
Actions #11

Updated by emiler about 1 year ago

https://openqa.suse.de/tests/10615012
Ok, weird. A re-run of the same test passed again, so the timeout is perhaps still not enough? I don't want to believe that this will hang for over 5 minutes.

Actions #12

Updated by pstivanin about 1 year ago

I think it'd be better to set the RETRY value to 3 in this case (via test suite json).

Actions #14

Updated by emiler about 1 year ago

  • Status changed from Feedback to Resolved
Actions #15

Updated by mgrifalconi about 1 year ago

Actions #16

Updated by tjyrinki_suse about 1 year ago

  • Status changed from Resolved to Workable

Reopening, it has failed 7 out of 10 times recently: https://openqa.suse.de/tests/10690470#next_previous , which is a bit high / cumbersome.

Actions #17

Updated by emiler about 1 year ago

  • Status changed from Workable to In Progress
Actions #18

Updated by emiler about 1 year ago

We can either increase the timeout, set a timeout multipler in the testsuite itself (i'd rather not), or, as Paolo suggested:

maybe we can try with more resources? qemucpu=host, qemcpus=2, qemuram=2048 ?

I'll take a look at this on Thursday.

Actions #19

Updated by emiler about 1 year ago

  • Subject changed from [security][maint][12sp3][12sp4] test fails in aa_autodep to [security][maint][12sp2][12sp3][12sp4] test fails in aa_autodep
Actions #20

Updated by emiler about 1 year ago

New PR adding more resources to the test run: https://gitlab.suse.de/qe-security/osd-sle15-security/-/merge_requests/73
I'll wait for new schedules, to see if it fails again, before closing.

Actions #21

Updated by emiler about 1 year ago

All three versions passed today:

Though I am still going to wait until Monday to double-check.

Actions #23

Updated by emiler 11 months ago

  • Related to action #131462: [security][12-sp2] test fails in aa_autodep added
Actions

Also available in: Atom PDF