Project

General

Profile

Actions

action #102332

closed

Unable to read *.json: Can't open file in o3 openQA logs /var/log/openqa size:M

Added by okurz about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-11-12
Due date:
2022-01-28
% Done:

0%

Estimated time:

Description

Observation

Lot's of error messages (though in debug log level) from o3 openQA logs /var/log/openqa about unable to read job module files, e.g.

[2021-11-12T10:41:41.829383Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-http_srv.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-http_srv.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.834474Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-apache.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-apache.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.837720Z] [debug] [pid:16420] creating 69e/9f9/0cee7828f310f14cad3c0d4b94.png
[2021-11-12T10:41:41.838910Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-dns_srv.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-dns_srv.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.841561Z] [debug] [pid:16420] creating dd6/214/dd8228480d62aab195ab8b2afe.png
[2021-11-12T10:41:41.843574Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-postgresql_server.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-postgresql_server.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.844416Z] [debug] [pid:16420] creating 471/571/68b2030123de661421e8dc0d8e.png
[2021-11-12T10:41:41.852448Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-mariadb_odbc.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-mariadb_odbc.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.854306Z] [debug] [pid:16420] creating a7c/3e1/0412b7062b74b61cd50a4c0e40.png
[2021-11-12T10:41:41.857774Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.858051Z] [debug] [pid:16420] creating 3c6/0b1/02d4f0f04f15e18bb983e2e2c5.png
[2021-11-12T10:41:41.861209Z] [debug] [pid:16420] creating 3c6/0b1/02d4f0f04f15e18bb983e2e2c5.png
[2021-11-12T10:41:41.862147Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_mysql.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_mysql.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.865762Z] [debug] [pid:16420] creating 998/1ca/df7f0469fa24c2b3a545d7e2cd.png
[2021-11-12T10:41:41.866743Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_postgresql.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_postgresql.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.868235Z] [debug] [pid:16420] creating af1/291/ffc78259b4ee24c3478c6c864d.png
[2021-11-12T10:41:41.871019Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_timezone.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_timezone.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.873593Z] [debug] [pid:16420] creating af1/291/ffc78259b4ee24c3478c6c864d.png
[2021-11-12T10:41:41.874783Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-orphaned_packages_check.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-orphaned_packages_check.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.876950Z] [debug] [pid:16420] creating 5f8/a51/e243ab3bed629e43882f8fdfb4.png
[2021-11-12T10:41:41.878904Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-consoletest_finish.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-consoletest_finish.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

Steps to reproduce

ssh o3 "sudo grep 'Unable to read.*No such file' /var/log/openqa"

Expected result

No longish error message in debug lines with "No such file or directory". Could be that this case is expected but then we should not point to perl code lines and provide a better user-facing explanation what this means

Acceptance criteria

  • AC1: No longish error message in debug lines with "No such file or directory"

Suggestions

Take a look into the mentioned code line and investigate if we missed some corner case that we can handle better.

The according code looks like this:

 90 sub results {
 91     my ($self, %options) = @_;
 92     my $skip_text_data = $options{skip_text_data};
 93 
 94     my $dir = $self->job->result_dir;
 95     return undef unless $dir;
 96     my $name = $self->name;
 97     my $file_name = "$dir/details-$name.json";
 98     my $initial_file_size = -s $file_name;
 99     my $json_data = eval { path($file_name)->slurp };
100     if (my $error = $@) {
101         log_debug("Unable to read $file_name: $error");
102         return {};
103     }

so we already explicitly handle the case but I think it's not necessary nor helpful to point to the specific code line.

Why should it be ok that individual result files don't exist?


Related issues 2 (1 open1 closed)

Related to openQA Project (public) - action #94531: OpenQA worker randomly skips uploading artefacts for whole test modulesNew2021-06-23

Actions
Related to openQA Project (public) - action #104262: o3 responds with 500 on softfailed for job that "failed to load modules"Resolvedokurz2021-12-22

Actions
Actions #1

Updated by okurz about 3 years ago

  • Description updated (diff)
Actions #2

Updated by livdywan about 3 years ago

  • Subject changed from Lot's of error messages (though in debug log level) from o3 openQA logs /var/log/openqa about unable to read job module files to Lots of error messages (debug log level) from o3 openQA logs /var/log/openqa about unable to read job module files

Interesting. I see this failing on CircleCI on #4353 (and 4288) but not on master:

[13:08:47] t/full-stack.t .. 17/? # full result panel contents:
# State: scheduled
# Scheduled product: job has not been created by posting an ISO
# 50

#   Failed test 'Expected result not found'
#   at /home/squamata/project/t/lib/OpenQA/Test/FullstackUtils.pm line 84.

#   Failed test 'test 1 is running'
#   at t/full-stack.t line 167.
# unable to read /tmp/i9h8WTUI2B/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt: Can't open file "/tmp/i9h8WTUI2B/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt": No such file or directory at t/full-stack.t line 161.
Bailout called.  Further testing stopped:  unable to run test 1
FAILED--Further testing stopped: unable to run test 1
make[2]: *** [Makefile:196: test-unit-and-integration] Error 2
make[2]: Leaving directory '/home/squamata/project'
make[1]: *** [Makefile:191: test-with-database] Error 2
make[1]: Leaving directory '/home/squamata/project'
make: *** [Makefile:180: test-fullstack] Error 2

Exited with code exit status 2

I hadn't commented there because I didn't, and still don't, understand what's causing this. But it looks the same.

Edit: I think my branch was hitting this because I made warnings seen before a job is done fatal i.e. inside the for loop in wait_for_result_panel. The tests currently pass with that message effectively ignored.

Actions #3

Updated by livdywan about 3 years ago

  • Related to action #94531: OpenQA worker randomly skips uploading artefacts for whole test modules added
Actions #4

Updated by livdywan about 3 years ago

  • Subject changed from Lots of error messages (debug log level) from o3 openQA logs /var/log/openqa about unable to read job module files to Unable to read *.json: Can't open file in o3 openQA logs /var/log/openqa size:M
  • Status changed from New to Workable
Actions #5

Updated by kraih about 3 years ago

  • Assignee set to kraih

This interests me, since it might have a relation to the upload issues we've investigated recently.

Actions #6

Updated by kraih almost 3 years ago

  • Status changed from Workable to In Progress

Logging the case where the file doesn't exist is a fairly recent change, where the author was merely refactoring the code and chose a slightly more defensive style. The original logic can be traced back 7 years, and used to just return when a file was not readable.

Actions #7

Updated by openqa_review almost 3 years ago

  • Due date set to 2021-12-28

Setting due date based on mean cycle time of SUSE QE Tools

Actions #8

Updated by kraih almost 3 years ago

Found a good case demonstrating the problem on OSD. https://openqa.suse.de/tests/7866603

[2021-12-16T15:02:30.564678+01:00] [debug] [pid:29218] Unable to read /var/lib/openqa/testresults/07866/07866603-sle-15-SP4-Full-s390x-Build74.1-select_modules_and_patterns+registration@s390x-kvm-sle12/details-setup_libyui.json: Can't open file "/var/lib/openqa/testresults/07866/07866603-sle-15-SP4-Full-s390x-Build74.1-select_modules_and_patterns+registration@s390x-kvm-sle12/details-setup_libyui.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

And the result for the module setup_libyui is none. So bootloader_start had a fatal test failure and stopped execution of the job, and from that module on there are no details because they never ran. And of course in the webui we have a list of all the job modules so we iterate through them and check for details-*.json files, which do not exist.

That means the proper solution here would be to expect the case and not log it. Because details-*.json files are not guaranteed to exist.

Actions #10

Updated by kraih almost 3 years ago

  • Status changed from In Progress to Feedback
Actions #11

Updated by okurz almost 3 years ago

  • Related to action #104262: o3 responds with 500 on softfailed for job that "failed to load modules" added
Actions #12

Updated by okurz almost 3 years ago

Do you think #104262 could be caused by this?

Actions #13

Updated by okurz almost 3 years ago

  • Due date changed from 2021-12-28 to 2022-01-28
Actions #14

Updated by okurz almost 3 years ago

So definitely regression from here, see #104262#note-7 . Reverting with https://github.com/os-autoinst/openQA/pull/4424 but now we know better where we need the soft handling of "unable to read"

Actions #15

Updated by kraih almost 3 years ago

The code was missing a readable check on the individual files in the loop. Previously the eval would catch the case where the file was missing completely. Will try to come up with a test case too.

Actions #17

Updated by livdywan almost 3 years ago

kraih wrote:

New PR. https://github.com/os-autoinst/openQA/pull/4427

Merged and deployed on osd

Actions #18

Updated by kraih almost 3 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF