Project

General

Profile

Actions

action #80570

closed

[qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry

Added by coolo over 3 years ago. Updated almost 3 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-12-SP5-Server-DVD-Updates-x86_64-qam-smt-client1@64bit fails in
smt_client1

There were no test changes and only openssh incident was opened, but looks highly unrelated to the failure. Something bizarre is going on

Test suite description

Reproducible

Fails since (at least) Build 20201129-1

Expected result

Last good: 20201128-1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 3 (1 open2 closed)

Related to openQA Project - action #69715: improve error feedback from lockapi to not just have "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 41."Workable2020-08-07

Actions
Related to openQA Tests - action #69787: [qe-core][qam][sporadic] test fails in rsync_client not waiting for the server long enough to sync on the barrier, auto_review:"(?s)cause_of_death.*barrier.*rsync_setup.*lock owner already finished.*Test died.*mydie.*lockapi"Resolved

Actions
Related to openQA Project - coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasonsResolvedokurz2020-04-012020-09-30

Actions
Actions #1

Updated by tjyrinki_suse over 3 years ago

  • Subject changed from test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1
  • Status changed from New to Workable
  • Start date deleted (2020-11-29)

Some runs are ok, some not.

Actions #2

Updated by dzedro over 3 years ago

I'm sure it's MM problem, while MM jobs were bound to one worker this didn't happen. Related to #65118

EDIT by okurz: changed full URL to other ticket by redmine internal link for link preview

Actions #3

Updated by okurz over 3 years ago

  • Subject changed from [qe-core][qem][sporadic] test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry

Also see #69715 for a feature request to improve the error message from just "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50."

The log of a failed job has more info:

[2020-12-12T04:24:10.040 CET] [debug] >>> testapi::wait_serial: (?^:kE1Aw-\d+-): ok
[2020-12-12T04:24:10.040 CET] [debug] barrier wait 'smt_setup'
[2020-12-12T04:24:10.040 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait
[2020-12-12T04:24:10.040 CET] [debug] <<< testapi::record_info(title="Paused", output="Wait for smt_setup (on parent job)", result="ok")
[2020-12-12T04:24:10.100 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait
[2020-12-12T04:24:10.100 CET] [debug] <<< bmwqemu::mydie(cause_of_death="acquiring barrier 'smt_setup': lock owner already finished")
[2020-12-12T04:24:10.176 CET] [info] ::: basetest::runtest: # Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50.

The parallel job https://openqa.suse.de/tests/5164849/file/autoinst-log.txt has

[2020-12-12T04:24:33.836 CET] [debug] barrier create 'smt_setup' for 2 tasks
…
[2020-12-12T04:28:52.028 CET] [debug] barrier wait 'smt_setup'
…
[2020-12-12T04:35:15.290 CET] [debug] barrier 'smt_setup' not released, sleeping 5 seconds
[2020-12-12T04:35:17.535 CET] [debug] autotest received signal TERM, saving results of current test before exiting

so same as explained in #65118#note-29 the client is not waiting for the server long enough to sync on the barrier. A similar problem was recorded in #69787 but no fix has been applied there either, for unknown reason the scenario simply did not reproduce the problem.

Adding auto-review regex according to https://github.com/os-autoinst/scripts/#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger

Actions #4

Updated by okurz over 3 years ago

  • Related to action #69715: improve error feedback from lockapi to not just have "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 41." added
Actions #5

Updated by okurz over 3 years ago

  • Related to action #69787: [qe-core][qam][sporadic] test fails in rsync_client not waiting for the server long enough to sync on the barrier, auto_review:"(?s)cause_of_death.*barrier.*rsync_setup.*lock owner already finished.*Test died.*mydie.*lockapi" added
Actions #6

Updated by okurz over 3 years ago

  • Related to coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasons added
Actions #7

Updated by okurz over 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: qam-smt-client1
https://openqa.suse.de/tests/5232012

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #8

Updated by zluo about 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over.

Actions #9

Updated by zluo about 3 years ago

http://10.160.64.152/tests/58 looks like shutdown issue, it takes forever...

State: uploading started about an hour ago

Actions #12

Updated by zluo about 3 years ago

found out that smt_client got started without parent tests, for example:
https://openqa.suse.de/tests/5455200
https://openqa.suse.de/tests/5453113
https://openqa.suse.de/tests/5448079

or failed because qam-smt-server@64bit failed:
https://openqa.suse.de/tests/5426775#dependencies
https://openqa.suse.de/tests/5458977#dependencies
https://openqa.suse.de/tests/5465900#dependencies

https://openqa.suse.de/tests/5465899#step/smt_server/4 shows for example that connection to database failed:

see logs:
https://openqa.suse.de/tests/5465899/file/autoinst-log.txt

[2021-02-15T00:59:49.382 CET] [info] ::: basetest::runtest: # Test died: command 'smt-repos -m' failed at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/smt/smt_server.pm line 27.

Actions #14

Updated by zluo about 3 years ago

my WIP PR to handle mariadb issue: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11979

But my MM setup is not working, so this is blocking me to work on this issue.

Actions #15

Updated by zluo about 3 years ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/12014 should fix the issue, will check tests results on OSD later.

Actions #16

Updated by tjyrinki_suse about 3 years ago

  • Target version set to Ready
Actions #17

Updated by tjyrinki_suse about 3 years ago

  • Target version changed from Ready to QE-Core: Ready
Actions #18

Updated by tjyrinki_suse about 3 years ago

  • Target version changed from QE-Core: Ready to Ready
Actions #19

Updated by szarate about 3 years ago

  • Target version changed from Ready to QE-Core: Ready
Actions #20

Updated by zluo about 3 years ago

  • Status changed from In Progress to Rejected

this issue doesn't happen since 2 months. See commemt #15, it has been fixed now. reject it for now.

Actions #21

Updated by szarate almost 3 years ago

  • Target version changed from QE-Core: Ready to Ready
Actions #22

Updated by szarate almost 3 years ago

  • Target version changed from Ready to QE-Core: Ready
Actions

Also available in: Atom PDF