Project

General

Profile

action #91653

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #80150: [epic] Scale out openQA: Easier openQA setup

coordination #90758: [epic] python bindings for openQA

Python tests fail with generic error message regardless of the problem size:M

Added by cdywan 8 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2021-04-23
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

I came up with a fairly simple Python-based test for #91257 which I wanted to run on o3. And I re-ran it a couple times but I get rather opaque errors so debugging why it won't work feels like guess-work.

[2021-04-23T09:31:55.798 CEST] [debug] scheduling login /tests/openQA/login.pm
[2021-04-23T09:31:55.926 CEST] [warn] !!! autotest::loadtest: error on /tests/openQA/search.py: Error -- py_eval raised an exception at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Inline/Python.pm line 177.
BEGIN failed--compilation aborted at (eval 130) line 5.

error on /tests/openQA/search.py: Error -- py_eval raised an exception at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Inline/Python.pm line 177.
BEGIN failed--compilation aborted at (eval 130) line 5.
Compilation failed in require at /usr/lib/os-autoinst/OpenQA/Isotovideo/Utils.pm line 211.

There's no specific error besides the filename. It could be a problem with the runtime or some issue with the test code, I wouldn't know.

Acceptance criteria

  • AC1: Specific Python error message can be seen in autoinst-log.txt

Suggestions

History

#1 Updated by okurz 8 months ago

  • Tracker changed from coordination to action

#2 Updated by okurz 8 months ago

  • Description updated (diff)

#3 Updated by okurz 8 months ago

  • Target version set to future

#4 Updated by okurz 3 months ago

  • Target version changed from future to Ready

bringing that into the backlog in the hope to resolve the parent #90578 soon

#5 Updated by cdywan 3 months ago

  • Subject changed from Python tests fail with generic error message regardless of the problem to Python tests fail with generic error message regardless of the problem size:M
  • Description updated (diff)
  • Status changed from New to Workable

#6 Updated by osukup 3 months ago

probably unsolvable with Inline::Python ...

see https://github.com/niner/inline-python-pm/blob/8a2ec302d5e83cb84167da418432f26d37e1a403/Python.xs#L215-L220

on another side, it looks like Inline::Pythonon isn't completely dead ( some commit in last two years)

#7 Updated by mkittler 3 months ago

  • Assignee set to mkittler

osukup It is correct that Inline::Python is just calling PyErr_Print() (https://docs.python.org/3/c-api/exceptions.html#c.PyErr_Print) but we can likely redirect sys.stderr, e.g. sys.stderr = open('autoinst-log.txt', 'a').

#8 Updated by mkittler 3 months ago

  • Status changed from Workable to Feedback

Without any modifications to os-autoinst I'm actually getting a useful error message:

[2021-09-09T17:41:52.047 CEST] [debug] ||| starting search tests/openQA/search.py
[2021-09-09T17:41:52.149 CEST] [info] ::: basetest::runtest: # Test died: FileNotFoundError: [Errno 2] No such file or directory: 'foobar' at line 

The message is also recorded as step detail.

I've been provoking the error by putting an open("foobar") in the Python code. Apparently that's not enough to reproduce the issue. What kind of error was it in your case which only lead to the generic error message?

#9 Updated by cdywan 3 months ago

mkittler wrote:

Without any modifications to os-autoinst I'm actually getting a useful error message:

[2021-09-09T17:41:52.047 CEST] [debug] ||| starting search tests/openQA/search.py
[2021-09-09T17:41:52.149 CEST] [info] ::: basetest::runtest: # Test died: FileNotFoundError: [Errno 2] No such file or directory: 'foobar' at line 

The message is also recorded as step detail.

I've been provoking the error by putting an open("foobar") in the Python code. Apparently that's not enough to reproduce the issue. What kind of error was it in your case which only lead to the generic error message?

It was missing Python dependencies which resulted in this job which has the reason Reason: tests died: unable to load /tests/openQA/search.py, check the log for the cause (e.g. syntax error) but no specifics in the logs - that's what the ticket description shows.

#10 Updated by mkittler 3 months ago

Then I'm getting the reason tests died: unable to load tests/openQA/search.py, check the log for the cause (e.g. syntax error) and the log actually contains a Python traceback and error message:

[2021-09-10T15:32:08.328 CEST] [debug] scheduling login /tests/openQA/login.pm
Traceback (most recent call last):
  File "<string>", line 3, in <module>
ModuleNotFoundError: No module named 'foobar'
[2021-09-10T15:32:08.423 CEST] [warn] !!! autotest::loadtest: error on /tests/openQA/search.py: Error -- py_eval raised an exception at /usr/lib/perl5/vendor_perl/5.32.1/x86_64-linux-thread-multi/Inline/Python.pm line 177.
  BEGIN failed--compilation aborted at (eval 146) line 5.

error on /tests/openQA/search.py: Error -- py_eval raised an exception at /usr/lib/perl5/vendor_perl/5.32.1/x86_64-linux-thread-multi/Inline/Python.pm line 177.
BEGIN failed--compilation aborted at (eval 146) line 5.
Compilation failed in require at /hdd/openqa-devel/repos/os-autoinst/OpenQA/Isotovideo/Utils.pm line 211.

The error message is not in line with our usual logging but I find it quite acceptable.

I'm wondering what I do different than you. I'm also testing with the openQA-in-openQA test using the very same test module you've been working on. In case you were using Python 2 I don't see how it would make a difference (https://docs.python.org/2.7/c-api/exceptions.html#c.PyErr_Print).

Oh, and hacks like sys.stderr = open('autoinst-log.txt', 'a') shouldn't really be required because autoinst-log.txt is actually the file sys.stderr ends up in.

#11 Updated by okurz 3 months ago

  • Due date set to 2021-09-23

please trigger an according job on o3 to show that it works there

#13 Updated by mkittler 3 months ago

In production the traceback output is missing, indeed:

[2021-09-17T18:47:10.792 CEST] [debug] scheduling login /tests/openQA/login.pm
[2021-09-17T18:47:10.902 CEST] [warn] !!! autotest::loadtest: error on /tests/openQA/search.py: Error -- py_eval raised an exception at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Inline/Python.pm line 177.
  BEGIN failed--compilation aborted at (eval 130) line 5.

error on /tests/openQA/search.py: Error -- py_eval raised an exception at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Inline/Python.pm line 177.
BEGIN failed--compilation aborted at (eval 130) line 5.
Compilation failed in require at /usr/lib/os-autoinst/OpenQA/Isotovideo/Utils.pm line 209.

On the production worker the same version of perl-Inline-Python is installed as on TW (via https://build.opensuse.org/package/show/devel:openQA:Leap:15.2/perl-Inline-Python) so this shouldn't make a difference. Maybe the Perl/Python version makes a difference. On TW /usr/lib/perl5/vendor_perl/5.32.1/x86_64-linux-thread-multi/auto/Inline/Python/Python.so links against libpython3.8.so.1.0 and on Leap 15.2 /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/auto/Inline/Python/Python.so links against libpython3.6m.so.1.0. However, I doubt that Python's PyErr_Print function has changed a lot.

#14 Updated by mkittler 3 months ago

The following works just fine on the worker host as well:

martchus@openqaworker7:~> perl -e "use Inline Python => 'import foobar';"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'foobar'
Error -- py_eval raised an exception at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Inline/Python.pm line 177.
BEGIN failed--compilation aborted at -e line 1.

So there must be something within our code which suppresses/redirects Python's traceback somehow in some environments.

#15 Updated by mkittler 3 months ago

I can reproduce the problem locally when I use the container setup which is using Leap 15.2. However, I still don't know what makes the difference. When I test this outside of isotovideo but in the same way as autotest.pm uses Inline Python the error message is even shown within the container:

4068c8b89349:/ # perl -I/usr/lib/os-autoinst -e "use base 'basetest'; use Mojo::File qw(path); use Inline Python => 'import sys; sys.path.append(\"/usr/lib/os-autoinst\")'; use Inline Python => path('/var/lib/openqa/share/tests/openqa/tests/openQA/search.py')->slurp;"
Traceback (most recent call last):
  File "<string>", line 3, in <module>
ModuleNotFoundError: No module named 'foobar'
Error -- py_eval raised an exception at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Inline/Python.pm line 177.
BEGIN failed--compilation aborted at -e line 1.

#16 Updated by okurz 3 months ago

can you try https://blog.famzah.net/2010/04/08/auto-flush-both-stdout-and-stderr-in-perl/ ?

If this does not work please update the ticket description with clear steps to reproduce, e.g. based on the container based command lines you gave, plus a workaround, unassign, move the ticket to "future" with low prio.

#17 Updated by mkittler 3 months ago

#18 Updated by mkittler 2 months ago

The PR has been merged. I'll trigger a job again on o3 after the deployment.

#19 Updated by okurz 2 months ago

  • Due date changed from 2021-09-23 to 2021-09-28

To check on Monday

#21 Updated by mkittler 2 months ago

  • Status changed from Feedback to Resolved

It worked, the logs of https://openqa.opensuse.org/tests/1940977 show the Python traceback and error message.

#22 Updated by okurz 2 months ago

even in color :)

#23 Updated by okurz about 1 month ago

  • Due date deleted (2021-09-28)

Also available in: Atom PDF