coordination #35215
closed[functional][u][epic][medium] test fails on shutdown module
0%
Description
Observation¶
openQA test in scenario opensuse-15.0-GNOME-Live-x86_64-gnome-live@uefi-usb-2G fails in
shutdown
to shutdown in time.
Reproducible¶
Fails since (at least) Build 15.1
Expected result¶
Last good: 14.8 (or more recent)
Problem¶
- H1: ACCEPT: Only usb-based images are affected -> other scenarios seem to be fine in general
- H2: REJECT: live-images shutdown does not work at all and needs to be reported as bug and worked around -> oorlov conducted two local verification runs, one passed and one fail so it can work
- H3: shutdown only takes much longer
- H4: Shutdown test fails because some of the installed services slow down it
- H4.1 It fails due to services, that were installed during the test run
- H4.2 It fails due to pre-installed service in the system itself
Suggestions¶
- Reproduce locally
- Investigate existing failures and see if we would maybe just need to bump the timeout
- Crosscheck other tickets about shutdown
- Make sure tests reliably shutdown and reboot by either just bumping timeout or finding out the real cause of being stuck or using another more reliant method to reboot as workaround
- If increasing timeout will not help or will show that shutdown takes too long time, then next step would be removing splash screen and check what is happening in background
- Merge three shutdown files into one (x11/shutdown.pm, shutdown/shutdown.pm, kernel/shutdown_ltp.pm) to avoid duplication and have single place where shutdown logic is located (1d);
- Add generic method to raise 'soft fail' if some amount of time is exceeded but action is not finished. If the time exceeded, then soft fail should be reported and test could wait more time for the action to be finished (it is required, because we want to raise 'soft fail' if assert_shutdown() is not completed in time and then wait until shutdown will actually finish) (2d);
- Add ability to debug system shutdown, e.g. it could be done if job is started with "DEBUG_SHUTDOWN" parameter (4h)
- Test changed code on several combinations of tests (1d)
Further details¶
Always latest result in this scenario: latest
Files
Updated by okurz over 6 years ago
- Subject changed from [functional][u][sporadic][fast] test fails in shutdown of new leap live image tests to [functional][u][sporadic][medium] test fails in shutdown of new leap live image tests
- Status changed from New to Workable
- Difficulty set to medium
Updated by okurz over 6 years ago
- Target version changed from Milestone 15 to Milestone 16
correcting milestone, it's actually M16
Updated by okurz over 6 years ago
- Description updated (diff)
- Status changed from Workable to In Progress
@oorlov I know you are actually working on this one :)
Please update this ticket accordingly on a more timely base because it is urgent.
See how I updated the description as an example.
Updated by oorlov over 6 years ago
I've reproduced the issue locally. From 3 runs it failed twice.
After that I've commented all the modules in between boot and shutdown and executed 10 times. All the tests passed.
So, my assumption is that some of the installed packages slow the shutdown. I'll try to increase the timeout and see what will happen.
Updated by JERiveraMoya over 6 years ago
Seen also in SLE15: https://openqa.suse.de/tests/1647051#step/shutdown/15
Updated by okurz over 6 years ago
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4956 creared and merged, test changes were synced to o3 and I retriggered some tests which failed in shutdown. Please monitor:
Updated by oorlov over 6 years ago
- Description updated (diff)
After discussion with okurz it was decided to increase the timeout dramatically and trigger huge amount of test runs on o3 in order to check if it may solve the issue or it will still fail regardless of the timeout.
Pull request: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4956
Build on o3: https://openqa.opensuse.org/tests/overview?version=15.0&build=poo35215_investigation&distri=opensuse&groupid=39
Updated by okurz over 6 years ago
- Related to coordination #35302: [qe-core][opensuse][functional][epic][sporadic] Various unstable tests on o3 added
Updated by oorlov over 6 years ago
I've verified a lot of failed 'shutdown' modules on o3 and found out, that they could be divided into two groups (for now I'm not sure if it is caused by the same reason):
- First group: KDE tests. After pressing 'ctrl+alt+del' the popup with 'power-off' button is not appeared and test fails after 15 seconds of waiting for it. For these tests I've made snapshots and will try investigate the problem locally.
- Second group: Gnome tests. After pressing 'Power Off' button, test fails after 60 seconds of waiting in assert_shutdown. There are similar issue for sle15, with soft-fail mark bsc#1055462. So I'll try to increase timeout and see what will happen.
Updated by oorlov over 6 years ago
So, for KDE case I've created a ticket bsc#1091933, increased wait timeout and added soft-fail for case if waiting for "Power Off" popup to be appeared is more, than 15 seconds.
Gnome case seems to be different issue, as it is stuck on Plymouth splash screen while powering off the machine. Trying to investigate the root case of that behavior.
Also, I'm working on improving shutdown hook (to grab more information about the system in case of failed test).
Updated by oorlov over 6 years ago
Added pool request for workaround on KDE.
Need a time to investigate Gnome problem, as I slow down with the snapshots due to qemu issue. I've found workaround and can proceed further now.
Also I want to improve the shutdown fail hook (to add more info to the logs on).
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-05-08 to 2018-05-22
Updated by okurz over 6 years ago
- Related to action #33571: [opensuse][functional][u][medium] test fails in shutdown - emoticon settings are opened added
Updated by oorlov over 6 years ago
Improved post_fail_hook with more logging was added. PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5048
Currently, the issue looks more like timeout issue. And it happened randomly to different test runs, regardless of desktop (gnome/kde), mostly for livecd installation and update OS.
I'm running today a lot of builds locally with different timeouts to find the timeout that will allow the test to check the shutdown properly.
Updated by okurz over 6 years ago
- Related to action #36040: [opensuse][functional][u] test fails in shutdown - need a way to check after shutdown needle got matched but shutdown hasn't been executed for unknown reason added
Updated by okurz over 6 years ago
one of the latest examples: https://openqa.opensuse.org/tests/678386#step/shutdown/15
any news?
Updated by oorlov over 6 years ago
I've added PR with the increased timeout for Live CD, as it solves the issue.
I've executed 18 runs and all of them were passed.
Pull Request: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5094
Updated by oorlov over 6 years ago
- Status changed from Feedback to Resolved
I've checked two latest builds. Shutdown module stopped failing on Live CD due to timeout issue.
We still have several failed shutdown modules in other jobs, but they are related to another issue (e.g. poo#36150 ).
Updated by okurz over 6 years ago
- Due date changed from 2018-05-22 to 2018-06-05
- Status changed from Resolved to In Progress
- Priority changed from Urgent to Normal
As decided in the sprint review meeting: You removed the urgency by introducing a higher timeout for the live media but we IMHO we have not solved the underlying issue or understood it well enough. This is something we should track better still.
Updated by riafarov over 6 years ago
- Related to deleted (action #36040: [opensuse][functional][u] test fails in shutdown - need a way to check after shutdown needle got matched but shutdown hasn't been executed for unknown reason)
Updated by riafarov over 6 years ago
- Blocks action #36040: [opensuse][functional][u] test fails in shutdown - need a way to check after shutdown needle got matched but shutdown hasn't been executed for unknown reason added
Updated by oorlov over 6 years ago
- File shutdown-logs.tar.gz shutdown-logs.tar.gz added
I've executed the tests in the following way (as suggested by @nsinger):
- Specify parameters on bootload: "systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M printk.devkmsg=on enforcing=0";
- Mount a drive to the system;
- Add 'debug.sh' script with the following content
#!/bin/sh
mount /dev/vda1 /tmp/test/
dmesg > /tmp/test/shutdown-log.txt
umount /dev/vda1
to /usr/lib/systemd/system-shutdown.
After the job finishes, mount a qcow2 image in the local system and copy shutdown-log.txt.
I've added logs of 3 failed jobs and 1 passed. Please see them in the attached 'shutdown-logs.tar.gz' archive.
Tests failed while shutdown timeout was set up to 60 sec.
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-06-05 to 2018-06-19
- Target version changed from Milestone 16 to Milestone 17
Updated by oorlov over 6 years ago
After investigating the logs with @riafarov and @SLindoMansilla it was decided to create a bcs#1096241.
An appropriate workaround with soft-fail will be added to the tests.
Updated by okurz over 6 years ago
Please add your time estimation for the individual tasks
Updated by okurz over 6 years ago
- Related to action #36150: [opensuse][functional][u][sporadic] test fails in shutdown on unsafe code looking for "sddm_shutdown_option_btn" added
Updated by okurz over 6 years ago
- Target version changed from Milestone 17 to Milestone 17
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-06-19 to 2018-07-03
Updated by oorlov over 6 years ago
Added PR with merged shutdown modules and divided shutdown to 'cleanup' and 'shutdown' parts.
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5296
Also, I've made a debug part that should be called while running job with "DEBUG_SHUTDOWN" flag, but did not made a PR for it as it depends on the code from the previous PR.
Currently I'm working on the assert shutdown with soft fail.
Updated by okurz over 6 years ago
oorlov wrote:
Also, I've made a debug part that should be called while running job with "DEBUG_SHUTDOWN" flag, but did not made a PR for it as it depends on the code from the previous PR.
Please provide it nevertheless as WIP-PR. We won't merge it but we should provide feedback as early as possible, ok?
Updated by riafarov over 6 years ago
- Related to deleted (action #36150: [opensuse][functional][u][sporadic] test fails in shutdown on unsafe code looking for "sddm_shutdown_option_btn")
Updated by riafarov over 6 years ago
- Blocks action #36150: [opensuse][functional][u][sporadic] test fails in shutdown on unsafe code looking for "sddm_shutdown_option_btn" added
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-07-03 to 2018-07-31
Updated by oorlov over 6 years ago
[WIP]PR for shutdown debugging: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5314
I've also tried to make assert shutdown with soft fail message without changing os-autoinst code. But it seems like it is impossible, as the assert_shutdown also marks test as failed if the shutdown is not finished and then sends 'croak'. Even if catch the 'croak' with the 'eval' or 'try-catch', the test will proceed further (i.e. no post_fail_hook will be called), but it will be marked as failed anyway.
So, that part needs further discussion.
Updated by oorlov over 6 years ago
- Status changed from In Progress to Feedback
Implemented assert_shutdown_soft_fail helper function: [WIP] PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5371
It also dependent on the PR in os-autoinst repo: https://github.com/os-autoinst/os-autoinst/pull/988
So, all the tasks for the ticket are covered. Now, waiting for the PRs to be reviewed.
Updated by oorlov over 6 years ago
I've added unit tests to os-autoinst project, as they were requested by the reviewers: https://github.com/os-autoinst/os-autoinst/pull/988
Updated by okurz over 6 years ago
- Blocks deleted (action #36150: [opensuse][functional][u][sporadic] test fails in shutdown on unsafe code looking for "sddm_shutdown_option_btn")
Updated by okurz over 6 years ago
- Related to action #36150: [opensuse][functional][u][sporadic] test fails in shutdown on unsafe code looking for "sddm_shutdown_option_btn" added
Updated by okurz over 6 years ago
- Target version changed from Milestone 17 to Milestone 18
Updated by zluo over 6 years ago
- Related to action #35011: [functional][sle][u][sporadic] test fails in user_defined_snapshot during reboot because shutdown is not working or it takes too long added
Updated by okurz over 6 years ago
- Blocks action #36997: [opensuse][functional][u][wayland] test fails in shutdown - xterm does not start added
Updated by okurz over 6 years ago
- Priority changed from High to Urgent
This seems to become more severe now. Please urgently visit current test failures, e.g.
Updated by oorlov over 6 years ago
So, what I've got for now:
- The issue occurs on Live images only;
- The issue started happening after my PR with cleanup_shutdown was merged. The difference between previous shutdown and current one is that instead of X11 Terminal the test opens tty console. And 'Stall detected' occurs while switching back to desktop;
- Some of the tests failed with 'Test died: ensure_unlocked_desktop repeated too much. Check for X-server crash.', some with 'Stall was detected during assert_screen fail';
- It is failing with 100% ratio on Krypton and Argon builds.
Updated by oorlov over 6 years ago
After deeper investigation I've found that it is not related to 'cleanup_before_shutdown' module.
The issue seems to be appear, because of console selecting in 'power_action' function. In the scenario, Wayland is used, but in power_action function it tries to select X11 console. Currently, I'm trying to reproduce the issue locally by removing all unnecessary modules in between to ensure, that the issue is related to the console.
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-07-31 to 2018-08-14
to remove urgency it might make sense to just stop execution of some modules to make it pass and solve it properly afterwards
Updated by oorlov over 6 years ago
fvogt made a PR, that fixes an issue with setting up the console for Wayland session.
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5496
Updated by oorlov over 6 years ago
- Blocks deleted (action #36040: [opensuse][functional][u] test fails in shutdown - need a way to check after shutdown needle got matched but shutdown hasn't been executed for unknown reason)
Updated by oorlov over 6 years ago
- Subject changed from [functional][u][sporadic][medium] test fails in shutdown of new leap live image tests to [functional][u][epic][medium] test fails on shutdown module
Updated by oorlov over 6 years ago
- Related to action #36040: [opensuse][functional][u] test fails in shutdown - need a way to check after shutdown needle got matched but shutdown hasn't been executed for unknown reason added
Updated by oorlov over 6 years ago
- Description updated (diff)
PR that allows to gather more logs on shutdown: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5314. Merged.
PR that allows to set shutdown timeout to any value by passing it to DEBUG_SHUTDOWN parameter: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5578
Updated by okurz over 6 years ago
- Due date changed from 2018-08-14 to 2018-08-28
bulk move to next sprint as could not be discussed in SR
Updated by SLindoMansilla over 6 years ago
- Status changed from Feedback to In Progress
- No PR open
- not waiting for a verification run
Updated by oorlov over 6 years ago
PR on os-autoinst, that introduces check_shutdown function: https://github.com/os-autoinst/os-autoinst/pull/988
PR on os-autoinst-distri-opensuse, that is dependent on the PR above: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5371
Updated by oorlov over 6 years ago
- Related to action #40055: [functional][u][bsc#1105517][sporadic][fast] - running application preventing the system from shutdown on otherDE_mate added
Updated by oorlov over 6 years ago
Extract power actions from generic utils: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5613, https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5625
Updated by oorlov over 6 years ago
The PR in os-autoinst with check_shutdown has been merged.
So, currently I'm waiting for it to be appeared on o3 and on osd. Then my PR in os-autoinst-distri-opensuse could be merged: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5371
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-08-28 to 2018-09-11
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-09-11 to 2018-09-25
- Target version changed from Milestone 18 to Milestone 19
Updated by okurz over 6 years ago
I checked the workers and both the osd and o3 workers have the os-autoinst change – have found #40880 though in the process. Please rebase your PR for the tests and expedite the merging.
Updated by oorlov over 6 years ago
- Status changed from In Progress to Feedback
The PR that introduces assert_shutdown_with_soft_timeout
merged: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5371
After several successful builds, this ticket could be finally closed, as all the requirements from this ticket are covered.
Further shutdown issues could be opened as alone tickets as for other issues.
Updated by okurz over 6 years ago
oorlov wrote:
After several successful builds, this ticket could be finally closed, as all the requirements from this ticket are covered.
Fine, don't hesitate to actually close it :) Latest job https://openqa.opensuse.org/tests/752806 in the originally mentioned scenario is fine regarding shutdown.
Updated by okurz about 6 years ago
- Related to action #43880: [functional][u][s390x][sporadic] test fails in shutdown on s390x added
Updated by szarate over 4 years ago
- Tracker changed from action to coordination
- Difficulty deleted (
medium)
Updated by szarate over 4 years ago
See for the reason of tracker change: http://mailman.suse.de/mailman/private/qa-sle/2020-October/002722.html