action #65040
closedcoordination #68794: [qe-core][functional][epic] rework postfail hooks
[sle][functiona][u] enhance post_fail_hook on OOM condition
0%
Description
from @okurz:
We already use magic-sysrq-w to find out if there are blocked tasks so we could trigger another sysrq commands, e.g. magic-sysrq-m to read memory information and parse from there if there is free memory. Also might be helpul in one of the test setup modules to already log into the log console so that we can switch to the already logged in console in case of problems and not get stuck at the login prompt. If the system is responsive during post_fail_hook and not workarounds need to be tried we could also read out from logs if there was an OOM condition. Also in the end we should be able to clearly determine from the information that we can gather from the SUT automatically what is wrong with kontact when it is only partially shown.
see #63355 for assumption that it could be related to OOM or it is related to poor performance that post_fail_hook fails already at login prompt.
Let's check this together at first.
Tasks¶
- Add new show_memory_information sub (similar to show_tasks_in_blocked_state)
- Adapt the sub to show memory information properly
- Add show_memory_information to the post fail hook in lib/opensusebasetest.pm
- use serial failures feature to parse and search for an oom condition derived from this
=head2 show_tasks_in_blocked_state
show_tasks_in_blocked_state();
Dumps tasks that are in uninterruptable (blocked) state and wait for headline
of dump.
See L<https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysrq.rst>.
=cut
sub show_tasks_in_blocked_state {
# sending sysrqs doesn't work for svirt
if (!check_var('BACKEND', 'svirt')) {
send_key 'alt-sysrq-w';
# info will be sent to serial tty
wait_serial(qr/sysrq\s*:\s+show\s+blocked\s+state/i, 1);
send_key 'ret'; # ensure clean shell prompt
}
}
Updated by zluo almost 5 years ago
- Related to action #63355: [opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM? added
Updated by SLindoMansilla over 4 years ago
- Category set to Enhancement to existing tests
Updated by SLindoMansilla over 4 years ago
- Description updated (diff)
- Status changed from New to Workable
- Target version set to Milestone 30
- Estimated time set to 42.00 h
Updated by dheidler over 4 years ago
- Status changed from Workable to In Progress
- Assignee set to dheidler
Updated by SLindoMansilla over 4 years ago
- Related to action #66607: [functional][u] Execute "SysRq t" when workqueue lockup is detected and publish kernel logs added
Updated by dheidler over 4 years ago
- Status changed from In Progress to Feedback
Updated by szarate over 4 years ago
Along with https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/10960 this can be considered done I guess.
Updated by szarate over 4 years ago
- Related to coordination #68794: [qe-core][functional][epic] rework postfail hooks added
Updated by szarate over 4 years ago
- Related to deleted (coordination #68794: [qe-core][functional][epic] rework postfail hooks)