Project

General

Profile

action #102332

Unable to read *.json: Can't open file in o3 openQA logs /var/log/openqa size:M

Added by okurz about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-11-12
Due date:
2022-01-28
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

Lot's of error messages (though in debug log level) from o3 openQA logs /var/log/openqa about unable to read job module files, e.g.

[2021-11-12T10:41:41.829383Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-http_srv.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-http_srv.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.834474Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-apache.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-apache.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.837720Z] [debug] [pid:16420] creating 69e/9f9/0cee7828f310f14cad3c0d4b94.png
[2021-11-12T10:41:41.838910Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-dns_srv.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-dns_srv.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.841561Z] [debug] [pid:16420] creating dd6/214/dd8228480d62aab195ab8b2afe.png
[2021-11-12T10:41:41.843574Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-postgresql_server.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-postgresql_server.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.844416Z] [debug] [pid:16420] creating 471/571/68b2030123de661421e8dc0d8e.png
[2021-11-12T10:41:41.852448Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-mariadb_odbc.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-mariadb_odbc.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.854306Z] [debug] [pid:16420] creating a7c/3e1/0412b7062b74b61cd50a4c0e40.png
[2021-11-12T10:41:41.857774Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.858051Z] [debug] [pid:16420] creating 3c6/0b1/02d4f0f04f15e18bb983e2e2c5.png
[2021-11-12T10:41:41.861209Z] [debug] [pid:16420] creating 3c6/0b1/02d4f0f04f15e18bb983e2e2c5.png
[2021-11-12T10:41:41.862147Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_mysql.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_mysql.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.865762Z] [debug] [pid:16420] creating 998/1ca/df7f0469fa24c2b3a545d7e2cd.png
[2021-11-12T10:41:41.866743Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_postgresql.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_postgresql.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.868235Z] [debug] [pid:16420] creating af1/291/ffc78259b4ee24c3478c6c864d.png
[2021-11-12T10:41:41.871019Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_timezone.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-php7_timezone.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.873593Z] [debug] [pid:16420] creating af1/291/ffc78259b4ee24c3478c6c864d.png
[2021-11-12T10:41:41.874783Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-orphaned_packages_check.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-orphaned_packages_check.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

[2021-11-12T10:41:41.876950Z] [debug] [pid:16420] creating 5f8/a51/e243ab3bed629e43882f8fdfb4.png
[2021-11-12T10:41:41.878904Z] [debug] [pid:16406] Unable to read /var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-consoletest_finish.json: Can't open file "/var/lib/openqa/testresults/02027/02027472-opensuse-15.3-JeOS-for-AArch64-aarch64-Build9.277-jeos@USBboot_aarch64/details-consoletest_finish.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

Steps to reproduce

ssh o3 "sudo grep 'Unable to read.*No such file' /var/log/openqa"

Expected result

No longish error message in debug lines with "No such file or directory". Could be that this case is expected but then we should not point to perl code lines and provide a better user-facing explanation what this means

Acceptance criteria

  • AC1: No longish error message in debug lines with "No such file or directory"

Suggestions

Take a look into the mentioned code line and investigate if we missed some corner case that we can handle better.

The according code looks like this:

 90 sub results {
 91     my ($self, %options) = @_;
 92     my $skip_text_data = $options{skip_text_data};
 93 
 94     my $dir = $self->job->result_dir;
 95     return undef unless $dir;
 96     my $name = $self->name;
 97     my $file_name = "$dir/details-$name.json";
 98     my $initial_file_size = -s $file_name;
 99     my $json_data = eval { path($file_name)->slurp };
100     if (my $error = $@) {
101         log_debug("Unable to read $file_name: $error");
102         return {};
103     }

so we already explicitly handle the case but I think it's not necessary nor helpful to point to the specific code line.

Why should it be ok that individual result files don't exist?


Related issues

Related to openQA Project - action #94531: OpenQA worker randomly skips uploading artefacts for whole test modulesNew2021-06-23

Related to openQA Project - action #104262: o3 responds with 500 on softfailed for job that "failed to load modules"Resolved2021-12-22

History

#1 Updated by okurz about 1 year ago

  • Description updated (diff)

#2 Updated by cdywan about 1 year ago

  • Subject changed from Lot's of error messages (though in debug log level) from o3 openQA logs /var/log/openqa about unable to read job module files to Lots of error messages (debug log level) from o3 openQA logs /var/log/openqa about unable to read job module files

Interesting. I see this failing on CircleCI on #4353 (and 4288) but not on master:

[13:08:47] t/full-stack.t .. 17/? # full result panel contents:
# State: scheduled
# Scheduled product: job has not been created by posting an ISO
# 50

#   Failed test 'Expected result not found'
#   at /home/squamata/project/t/lib/OpenQA/Test/FullstackUtils.pm line 84.

#   Failed test 'test 1 is running'
#   at t/full-stack.t line 167.
# unable to read /tmp/i9h8WTUI2B/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt: Can't open file "/tmp/i9h8WTUI2B/full-stack.d/openqa/testresults/00000/00000001-tinycore-1-flavor-i386-Build1-core@coolone/autoinst-log.txt": No such file or directory at t/full-stack.t line 161.
Bailout called.  Further testing stopped:  unable to run test 1
FAILED--Further testing stopped: unable to run test 1
make[2]: *** [Makefile:196: test-unit-and-integration] Error 2
make[2]: Leaving directory '/home/squamata/project'
make[1]: *** [Makefile:191: test-with-database] Error 2
make[1]: Leaving directory '/home/squamata/project'
make: *** [Makefile:180: test-fullstack] Error 2

Exited with code exit status 2

I hadn't commented there because I didn't, and still don't, understand what's causing this. But it looks the same.

Edit: I think my branch was hitting this because I made warnings seen before a job is done fatal i.e. inside the for loop in wait_for_result_panel. The tests currently pass with that message effectively ignored.

#3 Updated by cdywan about 1 year ago

  • Related to action #94531: OpenQA worker randomly skips uploading artefacts for whole test modules added

#4 Updated by cdywan about 1 year ago

  • Subject changed from Lots of error messages (debug log level) from o3 openQA logs /var/log/openqa about unable to read job module files to Unable to read *.json: Can't open file in o3 openQA logs /var/log/openqa size:M
  • Status changed from New to Workable

#5 Updated by kraih 12 months ago

  • Assignee set to kraih

This interests me, since it might have a relation to the upload issues we've investigated recently.

#6 Updated by kraih 12 months ago

  • Status changed from Workable to In Progress

Logging the case where the file doesn't exist is a fairly recent change, where the author was merely refactoring the code and chose a slightly more defensive style. The original logic can be traced back 7 years, and used to just return when a file was not readable.

#7 Updated by openqa_review 12 months ago

  • Due date set to 2021-12-28

Setting due date based on mean cycle time of SUSE QE Tools

#8 Updated by kraih 12 months ago

Found a good case demonstrating the problem on OSD. https://openqa.suse.de/tests/7866603

[2021-12-16T15:02:30.564678+01:00] [debug] [pid:29218] Unable to read /var/lib/openqa/testresults/07866/07866603-sle-15-SP4-Full-s390x-Build74.1-select_modules_and_patterns+registration@s390x-kvm-sle12/details-setup_libyui.json: Can't open file "/var/lib/openqa/testresults/07866/07866603-sle-15-SP4-Full-s390x-Build74.1-select_modules_and_patterns+registration@s390x-kvm-sle12/details-setup_libyui.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobModules.pm line 99.

And the result for the module setup_libyui is none. So bootloader_start had a fatal test failure and stopped execution of the job, and from that module on there are no details because they never ran. And of course in the webui we have a list of all the job modules so we iterate through them and check for details-*.json files, which do not exist.

That means the proper solution here would be to expect the case and not log it. Because details-*.json files are not guaranteed to exist.

#10 Updated by kraih 12 months ago

  • Status changed from In Progress to Feedback

#11 Updated by okurz 11 months ago

  • Related to action #104262: o3 responds with 500 on softfailed for job that "failed to load modules" added

#12 Updated by okurz 11 months ago

Do you think #104262 could be caused by this?

#13 Updated by okurz 11 months ago

  • Due date changed from 2021-12-28 to 2022-01-28

#14 Updated by okurz 11 months ago

So definitely regression from here, see #104262#note-7 . Reverting with https://github.com/os-autoinst/openQA/pull/4424 but now we know better where we need the soft handling of "unable to read"

#15 Updated by kraih 11 months ago

The code was missing a readable check on the individual files in the loop. Previously the eval would catch the case where the file was missing completely. Will try to come up with a test case too.

#17 Updated by cdywan 11 months ago

kraih wrote:

New PR. https://github.com/os-autoinst/openQA/pull/4427

Merged and deployed on osd

#18 Updated by kraih 11 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF