action #65040
closedcoordination #68794: [qe-core][functional][epic] rework postfail hooks
[sle][functiona][u] enhance post_fail_hook on OOM condition
0%
Description
from @okurz:
We already use magic-sysrq-w to find out if there are blocked tasks so we could trigger another sysrq commands, e.g. magic-sysrq-m to read memory information and parse from there if there is free memory. Also might be helpul in one of the test setup modules to already log into the log console so that we can switch to the already logged in console in case of problems and not get stuck at the login prompt. If the system is responsive during post_fail_hook and not workarounds need to be tried we could also read out from logs if there was an OOM condition. Also in the end we should be able to clearly determine from the information that we can gather from the SUT automatically what is wrong with kontact when it is only partially shown.
see #63355 for assumption that it could be related to OOM or it is related to poor performance that post_fail_hook fails already at login prompt.
Let's check this together at first.
Tasks¶
- Add new show_memory_information sub (similar to show_tasks_in_blocked_state)
- Adapt the sub to show memory information properly
- Add show_memory_information to the post fail hook in lib/opensusebasetest.pm
- use serial failures feature to parse and search for an oom condition derived from this
=head2 show_tasks_in_blocked_state
show_tasks_in_blocked_state();
Dumps tasks that are in uninterruptable (blocked) state and wait for headline
of dump.
See L<https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysrq.rst>.
=cut
sub show_tasks_in_blocked_state {
# sending sysrqs doesn't work for svirt
if (!check_var('BACKEND', 'svirt')) {
send_key 'alt-sysrq-w';
# info will be sent to serial tty
wait_serial(qr/sysrq\s*:\s+show\s+blocked\s+state/i, 1);
send_key 'ret'; # ensure clean shell prompt
}
}