action #109473
closed[qe-core] test fails in sssd_389ds_functional - Improve error reporting in the docker container
0%
Description
Observation¶
The sssd tests seem to working really well, however, when something goes wrong inside the container, we don't know what's going on, beyond the message shown in the console.
AC¶
- Steps describing how to get the logs from the failing container exist at least for the
exec
command are documented in this ticket - Test uploads the execution logs on failure as part of the post fail hooks
- Root cause is identified and Maintenance has taken action
- [optional]: Document the action taken by Maintenance.
Suggestions¶
- Investigate how to extract the logs of the docker container when the command is
exec
, via hooks or - Find out which update was creating the conflict, or what changes happened that caused the test to start failing and notify to #team-lsg-qe-openqa-review on slack
- Improve further logging mechanisms to ensure easier investigation in the future
openQA test in scenario sle-15-SP3-Server-DVD-Updates-aarch64-sssd_389ds_functional@aarch64-virtio fails in
sssd_389ds_functional
Test suite description¶
Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Maintainer: QE Core / QE Security
Reproducible¶
Fails since (at least) Build 20220405-1
Expected result¶
Last good: 20220404-1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Files
Updated by mgrifalconi over 2 years ago
Attaching screenshot of error from journalctl -u dirsrv@frist389.service inside the docker container
Updated by szarate over 2 years ago
so awk is missing in the container? :facepalm:
Updated by mgrifalconi over 2 years ago
Jozef suggested this might be related https://progress.opensuse.org/issues/109497
Both 15-sp2 and 15-sp3 test are using a docker image of 15-sp3. This is the same of sssd_openldap_functional https://openqa.suse.de/tests/8473708
Questions are:
- is it expected to always use 15-sp3 container in all cases?
- should we apply the same fix as https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14506 to use the testing repos in the container as well?
- has this something to do with missing awk on the container? Is that the real cause of the failure right now?
Updated by mgrifalconi over 2 years ago
Considering the similarity with the issue of https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14506 (TEST repos are not used in docker container), this might be a regression caused by something already published.
I see https://smelt.suse.de/incident/23339/ was approved between last working job and first failure. Can that be anyhow related?
Updated by mgrifalconi over 2 years ago
Have a PR to fix the test by installing awk and increasing shm size: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14664
Not sure if it is expected to work like this though or not.
In any case we will have to think about the container version (currently hardcoded to 15.3) and to use tests repo so this ticket should not be closed after merging that PR.
Updated by tonyyuan over 2 years ago
I happened to be assigned a ticket by vsvecova two days ago. https://progress.opensuse.org/issues/109497
I think it duplicate of this ticket. The "shm-size=256m" fix is good.
However, there might be a bug/regression in 389ds. The test passed with QEMURAM=1024 and without shm-size=256m set. It seems that 389 calculated cachesize only based on physical memory size but does not take shared memory size into account.
Updated by szarate over 2 years ago
- Status changed from Workable to Resolved
tonyyuan wrote:
I happened to be assigned a ticket by vsvecova two days ago. https://progress.opensuse.org/issues/109497
I think it duplicate of this ticket. The "shm-size=256m" fix is good.
However, there might be a bug/regression in 389ds. The test passed with QEMURAM=1024 and without shm-size=256m set. It seems that 389 calculated cachesize only based on physical memory size but does not take shared memory size into account.
Could you report that bug on our behalf? Thanks!
Updated by szarate over 2 years ago
- Sprint set to QE-Core: April Sprint (Apr 13 - May 11)
Updated by tonyyuan over 2 years ago
A bug report is opened. https://bugzilla.suse.com/show_bug.cgi?id=1199444