action #80570
closed[qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry
0%
Description
Observation¶
openQA test in scenario sle-12-SP5-Server-DVD-Updates-x86_64-qam-smt-client1@64bit fails in
smt_client1
There were no test changes and only openssh incident was opened, but looks highly unrelated to the failure. Something bizarre is going on
Test suite description¶
Reproducible¶
Fails since (at least) Build 20201129-1
Expected result¶
Last good: 20201128-1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by tjyrinki_suse almost 4 years ago
- Subject changed from test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1
- Status changed from New to Workable
- Start date deleted (
2020-11-29)
Some runs are ok, some not.
Updated by dzedro almost 4 years ago
I'm sure it's MM problem, while MM jobs were bound to one worker this didn't happen. Related to #65118
EDIT by okurz: changed full URL to other ticket by redmine internal link for link preview
Updated by okurz almost 4 years ago
- Subject changed from [qe-core][qem][sporadic] test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry
Also see #69715 for a feature request to improve the error message from just "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50."
The log of a failed job has more info:
[32m[2020-12-12T04:24:10.040 CET] [debug] >>> testapi::wait_serial: (?^:kE1Aw-\d+-): ok
[0m[37m[2020-12-12T04:24:10.040 CET] [debug] barrier wait 'smt_setup'
[0m[2020-12-12T04:24:10.040 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait
[2020-12-12T04:24:10.040 CET] [debug] <<< testapi::record_info(title="Paused", output="Wait for smt_setup (on parent job)", result="ok")
[2020-12-12T04:24:10.100 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait
[2020-12-12T04:24:10.100 CET] [debug] <<< bmwqemu::mydie(cause_of_death="acquiring barrier 'smt_setup': lock owner already finished")
[33m[2020-12-12T04:24:10.176 CET] [info] ::: basetest::runtest: # Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50.
The parallel job https://openqa.suse.de/tests/5164849/file/autoinst-log.txt has
[0m[37m[2020-12-12T04:24:33.836 CET] [debug] barrier create 'smt_setup' for 2 tasks
…
[0m[37m[2020-12-12T04:28:52.028 CET] [debug] barrier wait 'smt_setup'
…
[0m[37m[2020-12-12T04:35:15.290 CET] [debug] barrier 'smt_setup' not released, sleeping 5 seconds
[0m[37m[37m[37m[2020-12-12T04:35:17.535 CET] [debug] autotest received signal TERM, saving results of current test before exiting
so same as explained in #65118#note-29 the client is not waiting for the server long enough to sync on the barrier. A similar problem was recorded in #69787 but no fix has been applied there either, for unknown reason the scenario simply did not reproduce the problem.
Adding auto-review regex according to https://github.com/os-autoinst/scripts/#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger
Updated by okurz almost 4 years ago
- Related to action #69715: improve error feedback from lockapi to not just have "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 41." added
Updated by okurz almost 4 years ago
- Related to action #69787: [qe-core][qam][sporadic] test fails in rsync_client not waiting for the server long enough to sync on the barrier, auto_review:"(?s)cause_of_death.*barrier.*rsync_setup.*lock owner already finished.*Test died.*mydie.*lockapi" added
Updated by okurz almost 4 years ago
- Related to coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasons added
Updated by okurz almost 4 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: qam-smt-client1
https://openqa.suse.de/tests/5232012
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by zluo almost 4 years ago
- Status changed from Workable to In Progress
- Assignee set to zluo
take over.
Updated by zluo almost 4 years ago
http://10.160.64.152/tests/58 looks like shutdown issue, it takes forever...
State: uploading started about an hour ago
Updated by zluo almost 4 years ago
found out that smt_client got started without parent tests, for example:
https://openqa.suse.de/tests/5455200
https://openqa.suse.de/tests/5453113
https://openqa.suse.de/tests/5448079
or failed because qam-smt-server@64bit failed:
https://openqa.suse.de/tests/5426775#dependencies
https://openqa.suse.de/tests/5458977#dependencies
https://openqa.suse.de/tests/5465900#dependencies
https://openqa.suse.de/tests/5465899#step/smt_server/4 shows for example that connection to database failed:
see logs:
https://openqa.suse.de/tests/5465899/file/autoinst-log.txt
[0m[33m[2021-02-15T00:59:49.382 CET] [info] ::: basetest::runtest: # Test died: command 'smt-repos -m' failed at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/smt/smt_server.pm line 27.
Updated by zluo over 3 years ago
my WIP PR to handle mariadb issue: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11979
But my MM setup is not working, so this is blocking me to work on this issue.
Updated by zluo over 3 years ago
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/12014 should fix the issue, will check tests results on OSD later.
Updated by tjyrinki_suse over 3 years ago
- Target version changed from Ready to QE-Core: Ready
Updated by tjyrinki_suse over 3 years ago
- Target version changed from QE-Core: Ready to Ready
Updated by szarate over 3 years ago
- Target version changed from Ready to QE-Core: Ready
Updated by zluo over 3 years ago
- Status changed from In Progress to Rejected
this issue doesn't happen since 2 months. See commemt #15, it has been fixed now. reject it for now.
Updated by szarate over 3 years ago
- Target version changed from QE-Core: Ready to Ready
Updated by szarate over 3 years ago
- Target version changed from Ready to QE-Core: Ready