[qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry
openQA test in scenario sle-12-SP5-Server-DVD-Updates-x86_64-qam-smt-client1@64bit fails in
There were no test changes and only openssh incident was opened, but looks highly unrelated to the failure. Something bizarre is going on
Test suite description¶
Fails since (at least) Build 20201129-1
Last good: 20201128-1 (or more recent)
Always latest result in this scenario: latest
- Subject changed from [qe-core][qem][sporadic] test fails in smt_client1 to [qe-core][qem][sporadic] test fails in smt_client1 auto_review:"mydie.*acquiring barrier 'smt_setup': lock owner already finished":retry
Also see #69715 for a feature request to improve the error message from just "# Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50."
The log of a failed job has more info:
[32m[2020-12-12T04:24:10.040 CET] [debug] >>> testapi::wait_serial: (?^:kE1Aw-\d+-): ok [0m[37m[2020-12-12T04:24:10.040 CET] [debug] barrier wait 'smt_setup' [0m[2020-12-12T04:24:10.040 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait [2020-12-12T04:24:10.040 CET] [debug] <<< testapi::record_info(title="Paused", output="Wait for smt_setup (on parent job)", result="ok") [2020-12-12T04:24:10.100 CET] [debug] tests/smt/smt_client1.pm:32 called lockapi::barrier_wait [2020-12-12T04:24:10.100 CET] [debug] <<< bmwqemu::mydie(cause_of_death="acquiring barrier 'smt_setup': lock owner already finished") [33m[2020-12-12T04:24:10.176 CET] [info] ::: basetest::runtest: # Test died: mydie at /usr/lib/os-autoinst/lockapi.pm line 50.
The parallel job https://openqa.suse.de/tests/5164849/file/autoinst-log.txt has
[0m[37m[2020-12-12T04:24:33.836 CET] [debug] barrier create 'smt_setup' for 2 tasks … [0m[37m[2020-12-12T04:28:52.028 CET] [debug] barrier wait 'smt_setup' … [0m[37m[2020-12-12T04:35:15.290 CET] [debug] barrier 'smt_setup' not released, sleeping 5 seconds [0m[37m[37m[37m[2020-12-12T04:35:17.535 CET] [debug] autotest received signal TERM, saving results of current test before exiting
so same as explained in #65118#note-29 the client is not waiting for the server long enough to sync on the barrier. A similar problem was recorded in #69787 but no fix has been applied there either, for unknown reason the scenario simply did not reproduce the problem.
Adding auto-review regex according to https://github.com/os-autoinst/scripts/#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: qam-smt-client1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
https://openqa.suse.de/tests/5465899#step/smt_server/4 shows for example that connection to database failed:
[0m[33m[2021-02-15T00:59:49.382 CET] [info] ::: basetest::runtest: # Test died: command 'smt-repos -m' failed at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/smt/smt_server.pm line 27.
my WIP PR to handle mariadb issue: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11979
But my MM setup is not working, so this is blocking me to work on this issue.
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/12014 should fix the issue, will check tests results on OSD later.