action #42038
closed[sle][functional[u] test fails in shutdown - add post_fail_hook for shutdown module if possible
Added by mloviska about 6 years ago. Updated almost 6 years ago.
0%
Description
Observation¶
openQA test in scenario sle-12-SP4-Server-DVD-s390x-create_hdd_pcm_googlecloud@s390x-kvm-sle12 fails in
shutdown
Two issues with shutdown module have appeared in last 3 builds of sle12sp4 (0396, 0405, 0410) on s38:
- https://openqa.suse.de/tests/2124883#step/shutdown/19 -> authentication problem, or SUT performance issue
- https://openqa.suse.de/tests/2094651#step/shutdown/8 -> failed to trigger shutdown procedure
However it is not really clear what has happened with the system. It would be nice to have some data to analyze gathered by post_fail_hook.
Reproducible¶
Fails since (at least) Build 0410 (current job)
Expected result¶
Last good: 0405 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz about 6 years ago
- Related to action #43376: [functional][u] Adapt opensusebasetest to provide dmesg and journal log added
Updated by okurz about 6 years ago
- Related to action #42863: [sle][functional][u] test fails in first_boot - hard to see what is exactly the cause for timeout issue added
Updated by okurz about 6 years ago
- Related to action #43880: [functional][u][s390x][sporadic] test fails in shutdown on s390x added
Updated by SLindoMansilla about 6 years ago
- Target version set to Milestone 22
I would say this is important to debug failures for shutdown issues.
Updated by zluo almost 6 years ago
- Status changed from New to In Progress
- Assignee set to zluo
take over.
Updated by zluo almost 6 years ago
http://f40.suse.de/tests/884 shows problem with clone_job atm.
Updated by zluo almost 6 years ago
trying directly on osd:
http://openqa.suse.de/tests/2408946
Updated by zluo almost 6 years ago
trying to get test result: http://openqa.suse.de/tests/2409260#settings
Updated by zluo almost 6 years ago
if shutdown is triggered and if it failed, there is no way to execute post_fail_hook.
Updated by zluo almost 6 years ago
of course we can collect some logs before power_action, but is this what we want to do really?
Updated by okurz almost 6 years ago
The idea is to call the post_fail_hook
in any case as the system might still be stuck before the actual power down. On a high level you can describe the states "shutdown" as such:
- before shutdown was triggered
- system is shutting down
- system is powered off
The post_fail_hook should be able to identify the states and provide information suitable for every state.
Updated by zluo almost 6 years ago
use DEBUG_SHUTDOWN to change log level so that we can call later /usr/lib/systemd/system-shutdown/debug.sh in post_fail_hook:
Updated by zluo almost 6 years ago
http://f40.suse.de/tests/927#step/shutdown/7 shows problem with
assert_script_run("journalctl --no-pager -b 0 > /tmp/full_journal.log");
typing issue happens again and again.
Updated by zluo almost 6 years ago
http://f40.suse.de/tests/928#step/shutdown/9
with a empty space before journalctl it seems better but upload_logs failed because of curl server connection issue.
Updated by zluo almost 6 years ago
http://f40.suse.de/tests/930#step/shutdown/16 resolved the issue with typing, it requires select_console('log-console');
but connection issue for uploading logs is still there...
Updated by zluo almost 6 years ago
I found the reason why upload is not working, because cleanup_before_shutdown stop network.service:
# Proceed with dhcp cleanup on qemu backend only.
# Cleanup is made, because if same hdd image used in multimachine scenario
# on several nodes, the dhcp clients use same id and cause conflicts on dhcpd server.
if (check_var('BACKEND', 'qemu')) {
my $network_status = script_output('systemctl status network');
# Do dhcp cleanup for wicked
if ($network_status =~ /wicked/) {
systemctl 'stop network.service';
systemctl 'stop wickedd.service';
assert_script_run('ls /var/lib/wicked/');
save_screenshot;
assert_script_run('rm -f /var/lib/wicked/*.xml');
}
}
Updated by zluo almost 6 years ago
changes in cleanup_before_shutdown.pm
# Proceed with dhcp cleanup on qemu backend only.
# Cleanup is made, because if same hdd image used in multimachine scenario
# on several nodes, the dhcp clients use same id and cause conflicts on dhcpd server.
# network.service is needed for uploading log for DEBUG_SHUTDOWN
if (check_var('BACKEND', 'qemu') && !get_var('DEBUG_SHUTDOWN')){
my $network_status = script_output('systemctl status network');
# Do dhcp cleanup for wicked
if ($network_status =~ /wicked/) {
systemctl 'stop network.service';
systemctl 'stop wickedd.service';
assert_script_run('ls /var/lib/wicked/');
save_screenshot;
assert_script_run('rm -f /var/lib/wicked/*.xml');
}
}
http://f40.suse.de/tests/936#step/shutdown/21 shows successful uploading of logs.
Next step should be check this on s390 remote worker and think about what kind of additional logs are helpful.
Updated by zluo almost 6 years ago
http://f40.suse.de/tests/950#step/shutdown/15 shows expected results.
Updated by zluo almost 6 years ago
Updated by zluo almost 6 years ago
- Status changed from Feedback to In Progress
need to think about DEBUG_SHUTDOWN, since it already provide dmesg to serial0.txt.
Updated by zluo almost 6 years ago
cleanup_before_shutdown before shutdown collects already dmesg logs to serial0.txt, so add journal log extra to serial0.txt with comment "--- dmesg ---" and "journactl log".
http://f40.suse.de/tests/1076/file/serial0.txt shows now expected logs.
Updated by okurz almost 6 years ago
I like the idea of the PR but please keep in mind that the goal is to have a helpful post_fail_hook ready in all cases and not just on request when DEBUG_SHUTDOWN=1 is set
Updated by zluo almost 6 years ago
The Ticket is about to provide useful logs for shutdown, and if DEBUG_SHUTDONW is not set, then we cannot get full logs. This can be done before shutdown (when shutdown got triggered, the network is in most cases down for example.) So I think this is what we can achieve atm. The goal you mean is different then it here.
Updated by okurz almost 6 years ago
zluo wrote:
The Ticket is about to provide useful logs for shutdown, and if DEBUG_SHUTDONW is not set, then we cannot get full logs.
Please remember #42038#note-12 which we also discussed in person. The idea is if the system is actually not shut down so when we can still access a console. So I think in the post_fail_hook we should be able to try that.
Updated by mgriessmeier almost 6 years ago
- Status changed from In Progress to Feedback
Updated by zluo almost 6 years ago
Note:
before shutdown was triggered
- this is no problem, we can catch the logs in cleanup_before_shutdown.
system is shutting down
- this is complicated, we need to define what we need to check and what is possible to check during system's shutting down
system is powered off
- this is already covered in the test.
Updated by okurz almost 6 years ago
- Target version changed from Milestone 22 to Milestone 23
alright, so what do you think we still need to be able to close this ticket or do you think we are done already?
Updated by zluo almost 6 years ago
- Status changed from Feedback to Resolved
http://openqa.suse.de/tests/2470815/file/serial0.txt shows expected logs.
So we are done for now, however we should check the logs which provide useful information for the shutdown issue in future.