Project

General

Profile

Actions

action #58127

closed

[functional][y][timeboxed:16h] Fix shutdown from GUI on s390x backend

Added by oorlov over 4 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 29
Start date:
2019-10-14
Due date:
2019-11-05
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

shutdown module is marked as failed while shutting down with GUI on machine with s390x backend (e.g. https://openqa.suse.de/tests/3458377), though according to logs the system is shut down.

At the same time the test module passes correctly while shutting down with console. The appropriate workaround is applied for s390x backend in power_action_utils.pm module:

if (check_var('BACKEND', 's390x')) {
   record_soft_failure('poo##58127 - Temporary workaround, because shutdown module is marked as failed on s390x backend when shutting down from GUI.');
   select_console 'root-console';
   type_string "$action\n";
}

Task

  1. Investigate why the test module is marked as 'failed';
  2. Apply the solution to shut down from GUI without having errors;
  3. Remove the workaround.

Main goal is to assure that VM is off and finish the test, we should check the old code (ask Matthias for the support).
Here is PR with temporary solution https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8635/files

NOTE: The solution may require adding is_shutdown function to os-autoinst/backend/s390x.pm as it does not exist currently for the backend, though it is implemented for almost all other backends. This is the right way to check whether the system is shut down or not.
One of the possible ways to check the system to be shutdown is to check logs in x3270 console:

console('x3270')->expect_3270(  
   output_delim => qr/.*SIGP stop.*/,   
   timeout      => 30   
);

Related issues 1 (0 open1 closed)

Related to openQA Tests - action #57743: [functional][y][timeboxed:8h] test tries to reboot when poweroff is given in shutdownResolvedoorlov2019-10-072019-10-22

Actions
Actions #1

Updated by oorlov over 4 years ago

  • Description updated (diff)
Actions #2

Updated by oorlov over 4 years ago

  • Related to action #57743: [functional][y][timeboxed:8h] test tries to reboot when poweroff is given in shutdown added
Actions #3

Updated by oorlov over 4 years ago

  • Description updated (diff)
Actions #4

Updated by riafarov over 4 years ago

  • Subject changed from [functional][y] Fix shutdown from GUI on s390x backend to [functional][y][timeboxed:16h] Fix shutdown from GUI on s390x backend
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by riafarov over 4 years ago

  • Target version set to Milestone 29
Actions #6

Updated by JRivrain over 4 years ago

  • Assignee set to JRivrain
Actions #7

Updated by JRivrain over 4 years ago

All I can see so far is that it looks like when we try to shutdown from desktop, the function assert_shutdown_with_soft_timeout in power_action_utils.pm is never executed, a VNC crash is detected before. But why is it detected at all ?

(more in detail, the function check_shutdown that should be called by assert_shutdown_with_soft_timeout never happens)

Log when disconnecting from console

[2019-10-24T21:18:11.748 CEST] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/shutdown/shutdown.pm:28 called power_action_utils::power_action
[2019-10-24T21:18:11.748 CEST] [debug] <<< testapi::check_shutdown(timeout=60)
[2019-10-24T21:18:11.749 CEST] [debug] Backend does not implement is_shutdown - just sleeping

(then it just times out, and test ends with no issue)

Log when we try to disconnect from gnome

[2019-10-24T21:18:02.348 CEST] [debug] <<< backend::console_proxy::__ANON__(wrapped_call={
'args' => [],
'function' => 'kill_ssh',
'console' => 'iucvconn'
})
[2019-10-24T21:18:02.621 CEST] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/shutdown/shutdown.pm:28 called power_action_utils::power_action
[2019-10-24T21:18:02.621 CEST] [debug] <<< testapi::console(testapi_console='installation')
[2019-10-24T21:18:02.621 CEST] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/shutdown/shutdown.pm:28 called power_action_utils::power_action
[2019-10-24T21:18:02.621 CEST] [debug] <<< backend::console_proxy::__ANON__(wrapped_call={
'function' => 'disable_vnc_stalls',
'args' => [],
'console' => 'installation'
})

[... SOME MORE LINES]

'[  OK  ] Removed slice system-systemd\\x2dfsck.slice.                            ',
'[  OK  ] Reached target Shutdown.                                               ',
'dracut Warning: Killing all remaining processes                                 ',
'Powering off.                                                                   ',
'HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU'
];
[2019-10-09T17:52:37.246 CEST] [debug] considering VNC stalled, no update for 4.90 seconds
[2019-10-09T17:52:55.661 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
Error connecting to VNC server <s390hsl146.suse.de:5901>: IO::Socket::INET: connect: No route to host

I do not really understand what the disable_vnc_stalls is supposed to do, but I would assume it should prevent this 'considering VNC stalled, no update for 4.90 seconds'.

Actions #8

Updated by JRivrain over 4 years ago

  • Status changed from Workable to In Progress
Actions #9

Updated by JRivrain over 4 years ago

I think the problem is with VNC. maybe is expecting a certain return code from the client and is not receiving any, or a wrong one, hence the failure. I will check that tomorrow.

EDIT:

Test done, there is no problem with VNC. So I fail to understand why what works on other backend fails on zVM. It is not about is_shutdown nor check_shutdown, that code never has a chance to get executed because a VNC "crash" gets detected before.
I do not wish to spend days reverse-engineering os-autoinst to understand why - I already tried that for 2 days without success.
I think it's more a task for tools team.

Actions #10

Updated by JRivrain over 4 years ago

  • Status changed from In Progress to Feedback
Actions #11

Updated by JRivrain over 4 years ago

  • Status changed from Feedback to In Progress
Actions #12

Updated by okurz over 4 years ago

  • Status changed from In Progress to Feedback

Hi, as you pointed me to this ticket I will try to help where I can. https://progress.opensuse.org/projects/openqav3/wiki/#s390x-Test-Organisation might be helpful for understanding the architecture and the s390x z/VM specifics. The important difference regarding VNC is that for e.g. x86_64 qemu the worker connects to the VNC server provided by qemu. For s390x the worker connects to a VNC server which is provided by the SUT OS itself! So in the example of x86_64 qemu we can shut down the OS, reboot it, install, reinstall, whatever. As long as the qemu instance is around so is the VNC server. For s390x z/VM the VNC server vanishes as soon as we give a reboot or shutdown command. The os-autoinst consoles, e.g. the VNC based x11 console, rely on the VNC server staying available or crash otherwise unless one explicitly calls "disable_vnc_stalls", e.g. as done in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/7c33b5fd6cc1b0fa73f3d8a258848e530e5fead8/lib/power_action_utils.pm#L253

So much for explanation of "disable_vnc_stalls" and z/VM specifics. What I do not understand is the context of this ticket. I can see that https://openqa.suse.de/tests/3305529 was green 2 months ago and then in the next build the job incompleted in shutdown: https://openqa.suse.de/tests/3328339 . After 2 months not all logs are still available but the git hash values are. Checking the git log from git log1 --no-merges 71ab9756e8d6abedaed19fd974b53491bf1e6fbd..4d8d94fbff6974d38c138906906aedeeae42e8d7 I can find ac49a9e71 Re-enable VNC for s390x which sounds related. The git diff shows what can also be observed in the test thumbnails. The "last good" executed the poweroff successfully in the text terminal: https://openqa.suse.de/tests/3305529#step/shutdown/6 . The first bad uses the GUI in https://openqa.suse.de/tests/3328339#step/shutdown/9 and then fails in
https://openqa.suse.de/tests/3328339#step/shutdown/15 . Looking for a SLE12 reference I came to https://openqa.suse.de/tests/overview?version=12-SP5&build=0368&distri=sle&arch=s390x&flavor=Server-DVD&machine=s390x-zVM-vswitch-l2 but found no job that executed any gnome tests that include shutdown which I assume we had for former product versions but maybe not. Even SLE12SP3 does not show any for z/VM: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP3&build=0473&groupid=55&arch=s390x . This reminds me of #43658 which is related. So I assume you do not actually want to fix any regression but for the first time ever enable shutdown checking on s390x z/VM from a graphical session, right? In this case what I recommend is to check which specific console is the one causing problems and make sure that this one is disabled before calling the actual shutdown. One more thing that might be necessary and could be tested first: As soon as the final 'ret' is pressed to confirm shutdown, make sure to switch to a console which will not disappear, e.g. iucvconn or root-console. Also, just lately we have https://github.com/os-autoinst/os-autoinst/pull/1232 merged which allows to mark consoles as persistent. E.g. the iucvconn should be handled as such. This might help as well.

Actions #13

Updated by riafarov over 4 years ago

  • Assignee changed from JRivrain to riafarov
Actions #14

Updated by riafarov over 4 years ago

  • Status changed from Feedback to Resolved
  • Assignee changed from riafarov to JRivrain

We should do more research on this. shutdown module hasn't been executed for x11 for quite a while.

Actions #15

Updated by maritawerner over 3 years ago

Hi, I can see that this ticket is still assigned to that soft fail testcase: https://openqa.suse.de/tests/4756144#step/shutdown/12
Just forgotten I guess?

Actions #16

Updated by okurz over 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: allmodules+allpatterns+registration@s390x-zVM-vswitch-l3
https://openqa.suse.de/tests/4815334

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #17

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: select_modules_and_patterns+registration@s390x-zVM-vswitch-l2
https://openqa.suse.de/tests/6578757

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The label in the openQA scenario is removed
Actions #18

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: select_modules_and_patterns+registration@s390x-zVM-vswitch-l2
https://openqa.suse.de/tests/6860540

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The label in the openQA scenario is removed
Actions #19

Updated by openqa_review about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: select_modules_and_patterns+registration@s390x-zVM-vswitch-l2
https://openqa.suse.de/tests/8254240

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 80 days if nothing changes in this ticket.

Actions

Also available in: Atom PDF