Project

General

Profile

Actions

action #129703

open

[security] test fails in evolution_prepare_servers on ipmi

Added by tjyrinki_suse 11 months ago. Updated 15 days ago.

Status:
Blocked
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
32.00 h
Difficulty:
Tags:

Description

openQA test in scenario sle-15-SP5-Online-x86_64-fips_env_mode_tests_crypt_tool_intel_ipmi@64bit-ipmi fails in
evolution_prepare_servers

Last good: 100.1 (or more recent)


When this ipmi test eventually gets past boot_to_desktop etc, this seems like something that could have changed and causing an error that cannot be workarounded by retrying:

-bash: cd: /usr/share/doc/packages/dovecot: No such file or directory
bash: mkcert.sh: No such file or directory

However, dovecot is installed earlier as part of the same module: https://openqa.suse.de/tests/11181584#step/evolution_prepare_servers/6 - so how is the problem possible? The doc directory could be missing but it also does work for non-ipmi case: https://openqa.suse.de/tests/11175697#step/evolution_prepare_servers/32

Usually this ipmi test fails earlier in boot_to_desktop or console_setup, but there was also one earlier case of the same dovecot problem: https://openqa.suse.de/tests/11178402#step/evolution_prepare_servers/33

Actions #1

Updated by amanzini 11 months ago

  • Assignee set to amanzini
Actions #2

Updated by amanzini 11 months ago

  • Assignee deleted (amanzini)

test sometimes fail in the FIPS_SETUP step, seems unable to install some package

Loading repository data...
Reading installed packages...
Package 'libgnutls30-hmac' not found.
'libgcrypt20-hmac' is already installed.
No update candidate for 'libgcrypt20-hmac-1.6.1-16.83.1.x86_64'. The highest available version is already installed.
'libcryptsetup12-hmac' is already installed.
No update candidate for 'libcryptsetup12-hmac-2.0.6-3.3.1.x86_64'. The highest available version is already installed.
'libfreebl3-hmac' is already installed.
No update candidate for 'libfreebl3-hmac-3.79.4-58.97.1.x86_64'. The highest available version is already installed.
'libsoftokn3-hmac' is already installed.
No update candidate for 'libsoftokn3-hmac-3.79.4-58.97.1.x86_64'. The highest available version is already installed.
'libopenssl1_1-hmac' is already installed.
No update candidate for 'libopenssl1_1-hmac-1.1.1d-2.81.1.x86_64'. The highest available version is already installed.
Dtgik-104-

excerpt from fips_setup.pm :

    if (is_sle('>=15-sp4')) {
        my $pkg_list = {
            'libcryptsetup12-hmac' => '2.4.3',
            'libsoftokn3-hmac' => '3.68.3',
            'libgnutls30-hmac' => '3.7.3',
            'libfreebl3-hmac' => '3.68.3',
            'libopenssl1_1-hmac' => '1.1.1l',
            'libgcrypt20-hmac' => '1.9.4'
        };
        zypper_call("in " . join(' ', keys %$pkg_list));
        package_upgrade_check($pkg_list);
    }
Actions #3

Updated by openqa_review 11 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: fips_env_mode_tests_crypt_tool_intel_ipmi@64bit-ipmi
https://openqa.suse.de/tests/11181584#step/evolution_prepare_servers/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #4

Updated by JERiveraMoya 7 months ago

This scenario looks like a waste of time from the point of test maintenance (reproducing it again running multiple times the job and I hit all those several strange issues), it fails sporadically in too many places and no other squad is using similar setup, not even QE virtualization!, probably drop it is the best option.
Slack conversation: https://suse.slack.com/archives/C02CANHLANP/p1695192194470199
wdyt?

Actions #5

Updated by tjyrinki_suse 7 months ago

We should not let go of the wishful thinking that IPMI backend usage would be reliable for us to use in the future, as our stakeholders request it.

In practice we are however quite used to expecting IPMI failures all over. It is a valid point though if we should do something similar ie have separate IPMI tests if for example https://openqa.suse.de/tests/12180565 (sle-15-SP6-Online-x86_64-Build20.1-sev-es-gi-guest_developing-on-host_developing-kvm) is somehow more reliable.

But files disappearing right after installation, I'm not sure if that kind of problem even can be related to the way of being run. Could we try the same MACHINE=64bit-ipmi-amd-zen3 as they are using in that test, would that be any more reliable?

There is something different at least that on qemu more packages are installed https://openqa.suse.de/tests/11175697#step/evolution_prepare_servers/6 than on our IPMI: https://openqa.suse.de/tests/11181584#step/evolution_prepare_servers/6 - maybe somehow recommended packages installation is disabled in our create_hdd_textmode_intel_ipmi? Not that it would affect this missing mkcert.sh, as I double-checked it's in the "dovecot23" package.

Anyway, we might want to add automatic soft-fails for IPMI cases to make the results more readable.

Actions #6

Updated by JERiveraMoya 7 months ago

I understand better the scope now, which is bigger that the test suite to fix in the description, I understand now that ipmi is important for our scope, thanks for the clarification.
I was doing some testing to see the options. The first thing I noticed is that we are not testing in the same way than with qemu, using setting START_DIRECTLY_AFTER_TEST in the children test suite we are reusing the same VM, first the parent is executed and then the children are executed one after the other in alphabetical order, when in case of qemu we get a fresh VM after parent for each children.
This is what made me suspect about this recurrent error, https://openqa.suse.de/tests/11781313#step/consoletest_setup/33
this is most likely happening because we setup everything for each children, repeating even the console setup, then I tried to remove those setup from the 2nd and 3rd children executed and the first time was a success for not the second, there are other sporadic errors in different places. Besides I realized dropping the setup is not a safe option, because if the 1st child fails, 2nd and 3rd might succeed without having the fips setup done, leading to false positive.

Next option I thought was to duplicate the parent to basically mimic what we have in qemu, would be really really slow, one parent for each child, if it would worth it, but seems not because we hit this https://openqa.suse.de/tests/12365064#step/fips_setup/65 very often as well in the first child we run.

What helped was to use that worker as you mentioned to at least run the parent https://gitlab.suse.de/qe-security/osd-sle15-security/-/merge_requests/181
But if this area is owned by virtualization, we should try to provide them simple failure that they could fix one by one, otherwise is really messy.

Actions #7

Updated by JERiveraMoya 7 months ago

In QU we also have issues with ipmi, ie: https://openqa.suse.de/tests/12426812
in product also don't even boot: https://openqa.suse.de/tests/12430609#step/boot_from_pxe/13

Actions #8

Updated by openqa_review 6 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: create_hdd_textmode_intel_ipmi@64bit-ipmi
https://openqa.suse.de/tests/12505996#step/boot_from_pxe/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #9

Updated by openqa_review 5 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: create_hdd_textmode_intel_ipmi@64bit-ipmi
https://openqa.suse.de/tests/12426812#step/boot_from_pxe/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

Actions #10

Updated by tjyrinki_suse 5 months ago

  • Status changed from Workable to Blocked

Blocked since currently no 15-SP6 FIPS works.

Actions #11

Updated by tjyrinki_suse 15 days ago

We haven't had ipmi results for a while now. FIPS itself would be testable now.

Actions

Also available in: Atom PDF