Project

General

Profile

Actions

action #124934

closed

os-autoinst: Misleading error message in the openQA info panel reason size:M

Added by tinita about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-02-22
Due date:
% Done:

0%

Estimated time:

Description

Observation

We sometimes see things like

Reason: isotovideo died: Can't locate auto/NetAddr/IP/InetBase/AF_INET6.al in @INC

in the info panel of an openQA test.
See #124913 for example.

This message is distracting because it is actually not a fatal message, but from an eval:

https://metacpan.org/dist/NetAddr-IP/source/Lite/Util/lib/NetAddr/IP/InetBase.pm#L81

require Socket;

*AF_INET = \&Socket::AF_INET;

if (eval { AF_INET6() } ) {       # line 81
  *AF_INET6 = \&Socket::AF_INET6;
  $emulateAF_INET6 = -1;                        # have it, remind below
}

https://github.com/os-autoinst/os-autoinst/blob/master/isotovideo#L115

$SIG{__DIE__} = sub ($e) { $fatal_error = $e };

So when NetAddr::IP::InetBase calles the eval { AF_INET6() } (thanks to AutoLoader/AutoSplit this actually tries to load the subroutine AF_INET6 from a generated file AF_INET6.al) it is actually fine with it failing, but we still call our signal handler. We could check with $^S if we are in an eval and just discard the message if it's true - but see the notes section why that might not be nice. Then we will get rid of those distracting error messages in the Reason, which we have seen several times before.

Acceptance Criteria

AC1: We don't display non-fatal distracting error messages anymore

Notes

The $SIG{__DIE__} hook is called even inside an eval(). It was never intended to happen this way, but an implementation glitch made this possible.
...
Having to even think about the $^S variable in your exception handlers is simply wrong. $SIG{__DIE__} as currently implemented invites grievous and difficult to track down errors. Avoid it and use an END{} or CORE::GLOBAL::die override instead.


Related issues 2 (0 open2 closed)

Related to openQA Project - action #63718: incomplete reason with just "quit"/"died" could provide more informationResolvedmkittler2020-02-21

Actions
Copied from openQA Project - action #124913: os-autoinst broken (incomplete) with Reason: isotovideo died: Can't locate auto/NetAddr/IP/InetBase/AF_INET6.al in @INCRejectedmkittler2023-02-22

Actions
Actions #1

Updated by tinita about 1 year ago

  • Copied from action #124913: os-autoinst broken (incomplete) with Reason: isotovideo died: Can't locate auto/NetAddr/IP/InetBase/AF_INET6.al in @INC added
Actions #2

Updated by tinita about 1 year ago

  • Description updated (diff)
Actions #3

Updated by tinita about 1 year ago

  • Description updated (diff)
Actions #4

Updated by tinita about 1 year ago

  • Description updated (diff)
Actions #5

Updated by tinita about 1 year ago

So, signal handlers should only be used when there is no other solution, and I quoted also the perl documentation that says that.

I was wondering - why not just wrap the whole isotovideo code in an eval? Then we simply get the error in $@, and only in a fatal case. That's exactly what we want.

Actions #6

Updated by tinita about 1 year ago

  • Subject changed from os-autoinst: Discard error messages from inside eval in __DIE__ handler to os-autoinst: Misleading error message in the openQA info panel reason
  • Description updated (diff)
Actions #7

Updated by tinita about 1 year ago

  • Related to action #63718: incomplete reason with just "quit"/"died" could provide more information added
Actions #8

Updated by mkittler about 1 year ago

I was wondering - why not just wrap the whole isotovideo code in an eval? Then we simply get the error in $@, and only in a fatal case. That's exactly what we want.

That sounds too good to be true but maybe that's really exactly what we want.

Actions #9

Updated by mkittler about 1 year ago

  • Subject changed from os-autoinst: Misleading error message in the openQA info panel reason to os-autoinst: Misleading error message in the openQA info panel reason size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #10

Updated by mkittler about 1 year ago

  • Description updated (diff)
  • Category deleted (Regressions/Crashes)
  • Assignee set to mkittler
  • Target version deleted (Ready)
Actions #11

Updated by mkittler about 1 year ago

  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #12

Updated by mkittler about 1 year ago

  • Status changed from Workable to In Progress
Actions #13

Updated by mkittler about 1 year ago

  • Status changed from In Progress to Resolved

The PR has been merged. This should work. The changed code was already fully covered by tests and I've also tested it manually by adding a die in certain places.

I've also reproduced the original issue (#124913 and https://bugzilla.opensuse.org/show_bug.cgi?id=1208248) locally via sudo mv /usr/lib64/libopenblas.so.0{,.bak}. This now shows the "failed to start VM" error from the log (e.g. https://openqa.opensuse.org/tests/3137766) also as reason (and not something unrelated like in https://openqa.opensuse.org/tests/3137766). So I'm confident that this fix works.

Actions

Also available in: Atom PDF