action #80570: [qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry - openQA Tests (public) - openSUSE Project Management Tool

Actions

Copy link

action #80570

closed

[qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry

Added by coolo over 4 years ago. Updated about 4 years ago.

Status:

Rejected

Priority:

High

Assignee:

zluo

Category:

Bugs in existing tests

Target version:

QA (public) - QE-Core: Ready

Start date:

Due date:

% Done:

Estimated time:

Difficulty:

Description

Observation¶

openQA test in scenario sle-12-SP5-Server-DVD-Updates-x86_64-qam-smt-client1@64bit fails in
smt_client1

There were no test changes and only openssh incident was opened, but looks highly unrelated to the failure. Something bizarre is going on

Test suite description¶

Reproducible¶

Fails since (at least) Build 20201129-1

Expected result¶

Last good: 20201128-1 (or more recent)

Further details¶

Always latest result in this scenario: latest

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by tjyrinki_suse over 4 years ago

Subject changed from test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1
Status changed from New to Workable
Start date deleted (~~2020-11-29~~)

Some runs are ok, some not.

Actions

Copy link

Updated by dzedro over 4 years ago

I'm sure it's MM problem, while MM jobs were bound to one worker this didn't happen. Related to #65118

EDIT by okurz: changed full URL to other ticket by redmine internal link for link preview

Actions

Copy link

Updated by okurz over 4 years ago

Subject changed from [qe-core][qem][sporadic] test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry

Also see #69715 for a feature request to improve the error message from just "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50."

The log of a failed job has more info:

[32m[2020-12-12T04:24:10.040 CET] [debug] >>> testapi::wait_serial: (?^:kE1Aw-\d+-): ok
[0m[37m[2020-12-12T04:24:10.040 CET] [debug] barrier wait 'smt_setup'
[0m[2020-12-12T04:24:10.040 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait
[2020-12-12T04:24:10.040 CET] [debug] <<< testapi::record_info(title="Paused", output="Wait for smt_setup (on parent job)", result="ok")
[2020-12-12T04:24:10.100 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait
[2020-12-12T04:24:10.100 CET] [debug] <<< bmwqemu::mydie(cause_of_death="acquiring barrier 'smt_setup': lock owner already finished")
[33m[2020-12-12T04:24:10.176 CET] [info] ::: basetest::runtest: # Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50.

The parallel job https://openqa.suse.de/tests/5164849/file/autoinst-log.txt has

[0m[37m[2020-12-12T04:24:33.836 CET] [debug] barrier create 'smt_setup' for 2 tasks
…
[0m[37m[2020-12-12T04:28:52.028 CET] [debug] barrier wait 'smt_setup'
…
[0m[37m[2020-12-12T04:35:15.290 CET] [debug] barrier 'smt_setup' not released, sleeping 5 seconds
[0m[37m[37m[37m[2020-12-12T04:35:17.535 CET] [debug] autotest received signal TERM, saving results of current test before exiting

so same as explained in #65118#note-29 the client is not waiting for the server long enough to sync on the barrier. A similar problem was recorded in #69787 but no fix has been applied there either, for unknown reason the scenario simply did not reproduce the problem.

Adding auto-review regex according to https://github.com/os-autoinst/scripts/#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger

Actions

Copy link

Updated by okurz over 4 years ago

Related to action #69715: improve error feedback from lockapi to not just have "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 41." added

Actions

Copy link

Updated by okurz over 4 years ago

Related to action #69787: [qe-core][qam][sporadic] test fails in rsync_client not waiting for the server long enough to sync on the barrier, auto_review:"(?s)cause_of_death.*barrier.*rsync_setup.*lock owner already finished.*Test died.*mydie.*lockapi" added

Actions

Copy link

Updated by okurz over 4 years ago

Related to coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasons added

Actions

Copy link

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: qam-smt-client1
https://openqa.suse.de/tests/5232012

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released"
The label in the openQA scenario is removed

Actions

Copy link

Updated by zluo over 4 years ago

Status changed from Workable to In Progress
Assignee set to zluo

take over.

Actions

Copy link

Updated by zluo over 4 years ago

http://10.160.64.152/tests/58 looks like shutdown issue, it takes forever...

State: uploading started about an hour ago

Actions

Copy link

#12

Updated by zluo over 4 years ago

found out that smt_client got started without parent tests, for example:
https://openqa.suse.de/tests/5455200
https://openqa.suse.de/tests/5453113
https://openqa.suse.de/tests/5448079

or failed because qam-smt-server@64bit failed:
https://openqa.suse.de/tests/5426775#dependencies
https://openqa.suse.de/tests/5458977#dependencies
https://openqa.suse.de/tests/5465900#dependencies

https://openqa.suse.de/tests/5465899#step/smt_server/4 shows for example that connection to database failed:

see logs:
https://openqa.suse.de/tests/5465899/file/autoinst-log.txt

[0m[33m[2021-02-15T00:59:49.382 CET] [info] ::: basetest::runtest: # Test died: command 'smt-repos -m' failed at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/smt/smt_server.pm line 27.

Actions

Copy link

#14