Project

General

Profile

Actions

action #109473

closed

[qe-core] test fails in sssd_389ds_functional - Improve error reporting in the docker container

Added by szarate over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Bugs in existing tests
Start date:
2022-04-05
Due date:
% Done:

0%

Estimated time:
Difficulty:
Sprint:
QE-Core: April Sprint (Apr 13 - May 11)

Description

Observation

The sssd tests seem to working really well, however, when something goes wrong inside the container, we don't know what's going on, beyond the message shown in the console.

AC

  1. Steps describing how to get the logs from the failing container exist at least for the exec command are documented in this ticket
  2. Test uploads the execution logs on failure as part of the post fail hooks
  3. Root cause is identified and Maintenance has taken action
  4. [optional]: Document the action taken by Maintenance.

Suggestions

  1. Investigate how to extract the logs of the docker container when the command is exec, via hooks or
  2. Find out which update was creating the conflict, or what changes happened that caused the test to start failing and notify to #team-lsg-qe-openqa-review on slack
  3. Improve further logging mechanisms to ensure easier investigation in the future

openQA test in scenario sle-15-SP3-Server-DVD-Updates-aarch64-sssd_389ds_functional@aarch64-virtio fails in
sssd_389ds_functional

Test suite description

Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. Maintainer: QE Core / QE Security

Reproducible

Fails since (at least) Build 20220405-1

Expected result

Last good: 20220404-1 (or more recent)

Further details

Always latest result in this scenario: latest


Files

Actions #1

Updated by mgrifalconi over 2 years ago

  • Assignee set to mgrifalconi
Actions #2

Updated by mgrifalconi over 2 years ago

Attaching screenshot of error from journalctl -u dirsrv@frist389.service inside the docker container

Actions #3

Updated by szarate over 2 years ago

so awk is missing in the container? :facepalm:

Actions #4

Updated by mgrifalconi over 2 years ago

Jozef suggested this might be related https://progress.opensuse.org/issues/109497

Both 15-sp2 and 15-sp3 test are using a docker image of 15-sp3. This is the same of sssd_openldap_functional https://openqa.suse.de/tests/8473708

Questions are:

Actions #5

Updated by mgrifalconi over 2 years ago

Considering the similarity with the issue of https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14506 (TEST repos are not used in docker container), this might be a regression caused by something already published.

I see https://smelt.suse.de/incident/23339/ was approved between last working job and first failure. Can that be anyhow related?

Actions #6

Updated by mgrifalconi over 2 years ago

Have a PR to fix the test by installing awk and increasing shm size: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14664

Not sure if it is expected to work like this though or not.

In any case we will have to think about the container version (currently hardcoded to 15.3) and to use tests repo so this ticket should not be closed after merging that PR.

Actions #7

Updated by tonyyuan over 2 years ago

I happened to be assigned a ticket by vsvecova two days ago. https://progress.opensuse.org/issues/109497
I think it duplicate of this ticket. The "shm-size=256m" fix is good.
However, there might be a bug/regression in 389ds. The test passed with QEMURAM=1024 and without shm-size=256m set. It seems that 389 calculated cachesize only based on physical memory size but does not take shared memory size into account.

Actions #8

Updated by szarate over 2 years ago

  • Status changed from Workable to Resolved

tonyyuan wrote:

I happened to be assigned a ticket by vsvecova two days ago. https://progress.opensuse.org/issues/109497
I think it duplicate of this ticket. The "shm-size=256m" fix is good.
However, there might be a bug/regression in 389ds. The test passed with QEMURAM=1024 and without shm-size=256m set. It seems that 389 calculated cachesize only based on physical memory size but does not take shared memory size into account.

Could you report that bug on our behalf? Thanks!

Actions #9

Updated by szarate over 2 years ago

  • Sprint set to QE-Core: April Sprint (Apr 13 - May 11)
Actions

Also available in: Atom PDF