action #124934
closedos-autoinst: Misleading error message in the openQA info panel reason size:M
Description
Observation¶
We sometimes see things like
Reason: isotovideo died: Can't locate auto/NetAddr/IP/InetBase/AF_INET6.al in @INC
in the info panel of an openQA test.
See #124913 for example.
This message is distracting because it is actually not a fatal message, but from an eval:
https://metacpan.org/dist/NetAddr-IP/source/Lite/Util/lib/NetAddr/IP/InetBase.pm#L81
require Socket;
*AF_INET = \&Socket::AF_INET;
if (eval { AF_INET6() } ) { # line 81
*AF_INET6 = \&Socket::AF_INET6;
$emulateAF_INET6 = -1; # have it, remind below
}
https://github.com/os-autoinst/os-autoinst/blob/master/isotovideo#L115
$SIG{__DIE__} = sub ($e) { $fatal_error = $e };
So when NetAddr::IP::InetBase calles the eval { AF_INET6() }
(thanks to AutoLoader/AutoSplit this actually tries to load the subroutine AF_INET6
from a generated file AF_INET6.al
) it is actually fine with it failing, but we still call our signal handler. We could check with $^S
if we are in an eval and just discard the message if it's true - but see the notes section why that might not be nice. Then we will get rid of those distracting error messages in the Reason, which we have seen several times before.
Acceptance Criteria¶
AC1: We don't display non-fatal distracting error messages anymore
Notes¶
The
$SIG{__DIE__}
hook is called even inside aneval()
. It was never intended to happen this way, but an implementation glitch made this possible.
...
Having to even think about the$^S
variable in your exception handlers is simply wrong.$SIG{__DIE__}
as currently implemented invites grievous and difficult to track down errors. Avoid it and use anEND{}
orCORE::GLOBAL::die
override instead.
Updated by tinita almost 2 years ago
- Copied from action #124913: os-autoinst broken (incomplete) with Reason: isotovideo died: Can't locate auto/NetAddr/IP/InetBase/AF_INET6.al in @INC added
Updated by tinita almost 2 years ago
So, signal handlers should only be used when there is no other solution, and I quoted also the perl documentation that says that.
I was wondering - why not just wrap the whole isotovideo code in an eval
? Then we simply get the error in $@
, and only in a fatal case. That's exactly what we want.
Updated by tinita almost 2 years ago
- Subject changed from os-autoinst: Discard error messages from inside eval in __DIE__ handler to os-autoinst: Misleading error message in the openQA info panel reason
- Description updated (diff)
Updated by tinita almost 2 years ago
- Related to action #63718: incomplete reason with just "quit"/"died" could provide more information added
Updated by mkittler almost 2 years ago
I was wondering - why not just wrap the whole isotovideo code in an eval? Then we simply get the error in $@, and only in a fatal case. That's exactly what we want.
That sounds too good to be true but maybe that's really exactly what we want.
Updated by mkittler almost 2 years ago
- Subject changed from os-autoinst: Misleading error message in the openQA info panel reason to os-autoinst: Misleading error message in the openQA info panel reason size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler almost 2 years ago
- Description updated (diff)
- Category deleted (
Regressions/Crashes) - Assignee set to mkittler
- Target version deleted (
Ready)
Updated by mkittler almost 2 years ago
- Category set to Regressions/Crashes
- Target version set to Ready
Updated by mkittler almost 2 years ago
- Status changed from Workable to In Progress
Updated by mkittler almost 2 years ago
- Status changed from In Progress to Resolved
The PR has been merged. This should work. The changed code was already fully covered by tests and I've also tested it manually by adding a die
in certain places.
I've also reproduced the original issue (#124913 and https://bugzilla.opensuse.org/show_bug.cgi?id=1208248) locally via sudo mv /usr/lib64/libopenblas.so.0{,.bak}
. This now shows the "failed to start VM" error from the log (e.g. https://openqa.opensuse.org/tests/3137766) also as reason (and not something unrelated like in https://openqa.opensuse.org/tests/3137766). So I'm confident that this fix works.