Project

General

Profile

action #80570

[qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry

Added by coolo about 1 year ago. Updated 7 months ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-12-SP5-Server-DVD-Updates-x86_64-qam-smt-client1@64bit fails in
smt_client1

There were no test changes and only openssh incident was opened, but looks highly unrelated to the failure. Something bizarre is going on

Test suite description

Reproducible

Fails since (at least) Build 20201129-1

Expected result

Last good: 20201128-1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Project - action #69715: improve error feedback from lockapi to not just have "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 41."Workable2020-08-07

Related to openQA Tests - action #69787: [qe-core][qam][sporadic] test fails in rsync_client not waiting for the server long enough to sync on the barrier, auto_review:"(?s)cause_of_death.*barrier.*rsync_setup.*lock owner already finished.*Test died.*mydie.*lockapi"Resolved

Related to openQA Project - coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasonsResolved2020-04-012020-09-30

History

#1 Updated by tjyrinki_suse 12 months ago

  • Subject changed from test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1
  • Status changed from New to Workable
  • Start date deleted (2020-11-29)

Some runs are ok, some not.

#2 Updated by dzedro 12 months ago

I'm sure it's MM problem, while MM jobs were bound to one worker this didn't happen. Related to #65118

EDIT by okurz: changed full URL to other ticket by redmine internal link for link preview

#3 Updated by okurz 12 months ago

  • Subject changed from [qe-core][qem][sporadic] test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry

Also see #69715 for a feature request to improve the error message from just "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50."

The log of a failed job has more info:

[2020-12-12T04:24:10.040 CET] [debug] >>> testapi::wait_serial: (?^:kE1Aw-\d+-): ok
[2020-12-12T04:24:10.040 CET] [debug] barrier wait 'smt_setup'
[2020-12-12T04:24:10.040 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait
[2020-12-12T04:24:10.040 CET] [debug] <<< testapi::record_info(title="Paused", output="Wait for smt_setup (on parent job)", result="ok")
[2020-12-12T04:24:10.100 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait
[2020-12-12T04:24:10.100 CET] [debug] <<< bmwqemu::mydie(cause_of_death="acquiring barrier 'smt_setup': lock owner already finished")
[2020-12-12T04:24:10.176 CET] [info] ::: basetest::runtest: # Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50.

The parallel job https://openqa.suse.de/tests/5164849/file/autoinst-log.txt has

[2020-12-12T04:24:33.836 CET] [debug] barrier create 'smt_setup' for 2 tasks
…
[2020-12-12T04:28:52.028 CET] [debug] barrier wait 'smt_setup'
…
[2020-12-12T04:35:15.290 CET] [debug] barrier 'smt_setup' not released, sleeping 5 seconds
[2020-12-12T04:35:17.535 CET] [debug] autotest received signal TERM, saving results of current test before exiting

so same as explained in #65118#note-29 the client is not waiting for the server long enough to sync on the barrier. A similar problem was recorded in #69787 but no fix has been applied there either, for unknown reason the scenario simply did not reproduce the problem.

Adding auto-review regex according to https://github.com/os-autoinst/scripts/#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger

#4 Updated by okurz 12 months ago

  • Related to action #69715: improve error feedback from lockapi to not just have "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 41." added

#5 Updated by okurz 12 months ago

  • Related to action #69787: [qe-core][qam][sporadic] test fails in rsync_client not waiting for the server long enough to sync on the barrier, auto_review:"(?s)cause_of_death.*barrier.*rsync_setup.*lock owner already finished.*Test died.*mydie.*lockapi" added

#6 Updated by okurz 12 months ago

  • Related to coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasons added

#7 Updated by okurz 11 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: qam-smt-client1
https://openqa.suse.de/tests/5232012

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#8 Updated by zluo 10 months ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over.

#9 Updated by zluo 10 months ago

http://10.160.64.152/tests/58 looks like shutdown issue, it takes forever...

State: uploading started about an hour ago

#12 Updated by zluo 10 months ago

found out that smt_client got started without parent tests, for example:
https://openqa.suse.de/tests/5455200
https://openqa.suse.de/tests/5453113
https://openqa.suse.de/tests/5448079

or failed because qam-smt-server@64bit failed:
https://openqa.suse.de/tests/5426775#dependencies
https://openqa.suse.de/tests/5458977#dependencies
https://openqa.suse.de/tests/5465900#dependencies

https://openqa.suse.de/tests/5465899#step/smt_server/4 shows for example that connection to database failed:

see logs:
https://openqa.suse.de/tests/5465899/file/autoinst-log.txt

[2021-02-15T00:59:49.382 CET] [info] ::: basetest::runtest: # Test died: command 'smt-repos -m' failed at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/smt/smt_server.pm line 27.

#14 Updated by zluo 10 months ago

my WIP PR to handle mariadb issue: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11979

But my MM setup is not working, so this is blocking me to work on this issue.

#15 Updated by zluo 9 months ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/12014 should fix the issue, will check tests results on OSD later.

#16 Updated by tjyrinki_suse 9 months ago

  • Target version set to Ready

#17 Updated by tjyrinki_suse 9 months ago

  • Target version changed from Ready to QE-Core: Ready

#18 Updated by tjyrinki_suse 9 months ago

  • Target version changed from QE-Core: Ready to Ready

#19 Updated by szarate 9 months ago

  • Target version changed from Ready to QE-Core: Ready

#20 Updated by zluo 8 months ago

  • Status changed from In Progress to Rejected

this issue doesn't happen since 2 months. See commemt #15, it has been fixed now. reject it for now.

#21 Updated by szarate 7 months ago

  • Target version changed from QE-Core: Ready to Ready

#22 Updated by szarate 7 months ago

  • Target version changed from Ready to QE-Core: Ready

Also available in: Atom PDF