Project

General

Profile

Actions

action #105388

closed

[sle][migration][sle15sp4] Investigate to run migration test through serial console for SLE HPC

Added by leli about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
New test
Target version:
-
Start date:
2022-01-25
Due date:
% Done:

100%

Estimated time:
24.00 h
Difficulty:

Description

From Egbert's suggestion, we need investigate to run migration test through serial console so that we can see each cmd's input and output. In fact our migration test mainly test the migration process through graphic screen, we'll investigate whether we can change to test on serial console for migration process.

###########################
Hi Lemon,

thank you for the test matrix, unfortunately, I'm a bit lost regarding
the test case name and settings. I only see strings like:
"offline_slehpc15sp2_rmt_basesys-desk-dev-hpc-python2-srv-wsm-nvidia_def_full_tm:x000D"
A pointer to the source repository containing the test script and the
script source for an individual test would be more useful.
Since most of these tests run CLI command, it would be useful to see the
commands entered as well as the output to stout, stderr and the return
value. Since the migration tests use the keyboard interface and screen
capture of openQA, most of the information required is not captured:
commands entered and results printed to stdout and stderr scroll by
before they are captured.

Thank you!

Cheers,
Egbert,
########################

Actions #1

Updated by leli about 2 years ago

  • Subject changed from [sle][migration][sle15sp4] Investigate to run migration test through serial console to [sle][migration][sle15sp4] Investigate to run migration test through serial console for SLE HPC
Actions #2

Updated by leli about 2 years ago

Take a HPC test as example:

use base 'hpcbase';
use base 'hpc::cluster';
use strict;
use warnings;
use testapi;
use utils;
use lockapi;

sub run {
my ($self) = @_;
$self->select_serial_terminal;

 # disable packagekitd
 quit_packagekit();

 # Stop firewall
 systemctl 'stop ' . $self->firewall;

 $self->provision_cluster();

 set_hostname(get_var('HOSTNAME', 'susetest'));

 if (get_var('HPC_REPO')) {
     my $repo = get_var('HPC_REPO');
     my $reponame = get_required_var('HPC_REPONAME');
     zypper_call("ar -f $repo $reponame");
     assert_script_run "zypper lr | grep $reponame";

     zypper_call("--gpg-auto-import-keys ref");
     zypper_call 'up';
 }

}

It is clear this is a function test and easy to use serial console, while migration test is a process so need transfer all modules in the process to be on serial console.

Actions #3

Updated by leli about 2 years ago

I will use the https://openqa.nue.suse.com/tests/8011693 as an example to switch register_system to serial console as the first step.

Actions #4

Updated by leli about 2 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 10

Wait to have a look on the cmdline test for register_system on serial console:

openqa-clone-custom-git-refspec https://github.com/lemon-suse/os-autoinst-distri-opensuse/tree/Investigate-serial-console-for-migration-sle-hpc http://openqa.nue.suse.com/tests/8011693 -c "--apikey xxx --apisecret xxx" _GROUP=0 ADDON_REGBYCMD=1
Created job #8025489: sle-15-SP4-Migration-from-HPC15-SPx-x86_64-Build84.1-online_slehpc15sp3_pscc_basesys-desk-dev-hpc-python2-srv-wsm_def_full_zypp_tm@64bit -> http://openqa.nue.suse.com/t8025489

Actions #5

Updated by eeich about 2 years ago

Let me illustrate the point I was trying to make in the quoted email:
Have a look at https://openqa.nue.suse.com/tests/8011687. It indicates a failure somewhere in 'install_service' which is part of the migration test workflow - it is still unrelated to the migration test itself, but part of preparation of the system which will later be used for migration.
To help with pinpointing the issue and to determine whether we indeed have a problem with a package or there is an issue with the test case, I need to determine what steps have been performed. Apparently, a set of packages gets installed. This is a CLI operation. I can see parts of the installation process in the captured output (see https://openqa.nue.suse.com/tests/8011687#step/install_service/11) but the command line hasn't been captured.
If I'm able to see all commands that have been issued, together with their stdout and strerr output, I would be able to determine the root cause much more quickly.
Alternatively, I will look into the link to the test script in the left column: (https://openqa.nue.suse.com/tests/8011687/modules/install_service/steps/1/src) - this however doesn't reveal the commands issued either.
I agree that not all of migration testing can be done that way: an YaST driven offline migration and a call to 'yast migration' will bring up UI elements that will have to be screen shotted.
A zypper migration is CLI, it prompts the user for input, but for pure migration testing this can be bypassed.

Actions #6

Updated by leli about 2 years ago

eeich wrote:

Let me illustrate the point I was trying to make in the quoted email:
Have a look at https://openqa.nue.suse.com/tests/8011687. It indicates a failure somewhere in 'install_service' which is part of the migration test workflow - it is still unrelated to the migration test itself, but part of preparation of the system which will later be used for migration.
To help with pinpointing the issue and to determine whether we indeed have a problem with a package or there is an issue with the test case, I need to determine what steps have been performed. Apparently, a set of packages gets installed. This is a CLI operation. I can see parts of the installation process in the captured output (see https://openqa.nue.suse.com/tests/8011687#step/install_service/11) but the command line hasn't been captured.
If I'm able to see all commands that have been issued, together with their stdout and strerr output, I would be able to determine the root cause much more quickly.

I will try to change the install_service to serial console for you to have a look, wait https://openqa.nue.suse.com/tests/8033195#step/install_service/74.

This may need more work since if the service check module each select own console will cause more code change.

Alternatively, I will look into the link to the test script in the left column: (https://openqa.nue.suse.com/tests/8011687/modules/install_service/steps/1/src) - this however doesn't reveal the commands issued either.

For service check, the real code located in each service module and service_check.pm is just the entrance and controller so you can't see the related code directly.

I agree that not all of migration testing can be done that way: an YaST driven offline migration and a call to 'yast migration' will bring up UI elements that will have to be screen shotted.
A zypper migration is CLI, it prompts the user for input, but for pure migration testing this can be bypassed.

Yes, I think such CLI test should be able to transfer to serial console.

Actions #7

Updated by leli about 2 years ago

@eeich Hi, Egbert, could you have a look on the install_service on serial console? Is this what you expected?
https://openqa.nue.suse.com/tests/8033195#step/install_service/1

Actions #8

Updated by leli about 2 years ago

  • Status changed from In Progress to Feedback
Actions #9

Updated by leli almost 2 years ago

The previous log already removed, I run a new log for the ticket, https://openqa.nue.suse.com/tests/8445129#step/install_service/1

Actions #10

Updated by leli almost 2 years ago

@Egbert, please have a look on the latest test results, if you think such run on serial console as you expected, I can set a setting for migration on SLEHPC to make these modules such as zypper migration to run on serial console.

Actions #11

Updated by eeich almost 2 years ago

leli wrote:

@Egbert, please have a look on the latest test results, if you think such run on serial console as you expected, I can set a setting for migration on SLEHPC to make these modules such as zypper migration to run on serial console.

Sorry for missing this!
I had to look for a while what has changed and compare it to the original issue. I can see the first steps - up to https://openqa.nue.suse.com/tests/8445129#step/install_service/47 converted to pure text output. This should have the added benefit of allowing the text to be used for result checking, no need to update the needles when the console font has changed.
Does this also capture output to stderr?

Actions #12

Updated by leli almost 2 years ago

eeich wrote:

leli wrote:

@Egbert, please have a look on the latest test results, if you think such run on serial console as you expected, I can set a setting for migration on SLEHPC to make these modules such as zypper migration to run on serial console.

Sorry for missing this!
I had to look for a while what has changed and compare it to the original issue. I can see the first steps - up to https://openqa.nue.suse.com/tests/8445129#step/install_service/47 converted to pure text output. This should have the added benefit of allowing the text to be used for result checking, no need to update the needles when the console font has changed.
Ok, so I think we need arrange special migration test for HPC, zypper migration via cli cmd and no needle needed so will be a good choice. I will set a setting to control such action to select serial console for the test on such scenario for SLE HPC.
Does this also capture output to stderr?
Yes, it will but I haven't capture such failure on serial console since each module may change the console and the failure is random. Anyway, we will see it later.
https://openqa.nue.suse.com/tests/8479486#step/install_service/373

Actions #13

Updated by leli almost 2 years ago

  • Status changed from Feedback to In Progress
  • % Done changed from 10 to 20

Set HPC_SEL_SERIAL=1 for such SLEHPC test on serial console.
Wait https://openqa.nue.suse.com/tests/8637910#step/install_service/75 to verify.

Actions #14

Updated by eeich almost 2 years ago

Thank you! This looks a lot better already. What I'm missing, though is what the command that fails has printed to stdout and stderr. Usually zypper is rather verbose about the reason of a failure. So this output - together with the command line - are important to understand what's going on. Of course, the last lines of the zypper logs are relevant as well, however, in this case, they do not seem to be related to the issue at hand.
If you could also capture the command output to stdout and stderr (ideally flagged what is what), it would be great!
We should have this for all command executions. A wrapper that takes care of collecting this output would be a great feature.

Actions #15

Updated by leli almost 2 years ago

Wait http://openqa.nue.suse.com/tests/8727887#live to get some debug info for
my @pkginstall = split('\n', script_output q[zypper search -r SLE-Module-HPC] . $version . q[-Pool -r SLE-Module-HPC] . $version . q[-Updates | cut -d '|' -f 2 | sed -e 's/ //g' | grep -E '.-hpc.' | grep -vE 'system|module|suse' | grep -vE '.:digit:+:digit:+.gnu|.:digit:+:digit:+.-hpc' | grep -vE '.-static$' | grep -vE '.hpc-macros.'], proceed_on_failure => 1, timeout => 180);

Actions #16

Updated by leli almost 2 years ago

@egbert, I have updated the code to make cmd and output shown, please have a look on http://openqa.nue.suse.com/tests/8750202#step/install_service/38 and http://openqa.nue.suse.com/tests/8750202#step/install_service/42

Actions #17

Updated by eeich almost 2 years ago

leli wrote:

@egbert, I have updated the code to make cmd and output shown, please have a look on http://openqa.nue.suse.com/tests/8750202#step/install_service/38 and http://openqa.nue.suse.com/tests/8750202#step/install_service/42

Great, this provides a lot of useful information which had been missing before, thank you!

Actions #18

Updated by leli almost 2 years ago

  • % Done changed from 20 to 50

PR submitted but set in WIP for next week review, https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/14940

Actions #19

Updated by leli almost 2 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

PR merged, later set SEL_SERIAL_CONSOLE=1 for service check test to select serial console.

Actions

Also available in: Atom PDF