Project

General

Profile

Actions

action #178597

closed

Various job pages do not load due JS error

Added by favogt about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-03-10
Due date:
% Done:

0%

Estimated time:

Description

I'm trying to review test results for Leap 15.6 Images, but the details pages for most failed jobs do not load:

https://openqa.opensuse.org/tests/4912178
https://openqa.opensuse.org/tests/4912179
https://openqa.opensuse.org/tests/4912180

They only show

Unable to load test modules: TypeError: Cannot read properties of null (reading 'replace')

Acceptance criteria

AC1: JS errors don't prevent details from loading
AC2: It is known why this is starting to happen now

Suggestions

  • Look in assets/javascripts/render.js function renderModuleRow for text_data
  • Mitigate the error by making JS more forgiving if the variable is undef
  • This looks to be specific to 15.6 builds - other failing jobs don't seem to be affected
  • Consider showing a warning if fields are missing OR just use the default text for that purpose

Related issues 2 (1 open1 closed)

Related to openQA Project (public) - action #159447: logreport o3: Can't open file ".../libssh-18.txt": No such file or directory at OpenQA/Schema/Result/JobModules.pm line 113 size:MResolvedmkittler2022-12-05

Actions
Copied to openQA Project (public) - action #178855: Handle broken unicode in files from wait_serial resultsNew2025-03-10

Actions
Actions #1

Updated by tinita about 1 month ago

  • Target version set to Ready
Actions #2

Updated by tinita about 1 month ago

  • Description updated (diff)
Actions #3

Updated by gpuliti about 1 month ago · Edited

  • Assignee set to gpuliti
Actions #4

Updated by tinita about 1 month ago

Looking into /var/lib/openqa/testresults/04912/04912178-opensuse-15.6-Rescue-CD-x86_64-Build15.29-rescue@64bit-2G/details-*.json, there are some files where text_data is not set.

Actions #5

Updated by livdywan about 1 month ago

  • Description updated (diff)
Actions #6

Updated by tinita about 1 month ago

Actually there is only one occurrence in /var/lib/openqa/testresults/04912/04912178-opensuse-15.6-Rescue-CD-x86_64-Build15.29-rescue@64bit-2G where text_data is not set: details-consoletest_finish.json
{"result":"ok","text":"consoletest_finish-160.txt","title":"wait_serial"}
And the file consoletest_finish-160.txt is in the directory, so it didn't get removed because it couldn't be read for some reason. But the file looks fine, I tried $txtdata = decode('UTF-8', $txtfile->slurp) on it, and it worked.

Actions #7

Updated by gpuliti about 1 month ago

  • Priority changed from Urgent to High
Actions #8

Updated by tinita about 1 month ago

We are now seeing a followup error:
https://openqa.opensuse.org/tests/4912179

Unable to load test modules: TypeError: Failed to execute 'appendChild' on 'Node': parameter 1 is not of type 'Node'.

TypeError: Failed to execute 'appendChild' on 'Node': parameter 1 is not of type 'Node'.
    at createElement (test_result.js:3:6)
    at renderModuleRow (test_result.js:19:224)
    at renderModuleTable (test_result.js:24:19)
    at Object.renderTestModules [as renderContents] (test_result.js:133:60)
    at test_result.js:88:105

which points to const textresult = E('pre', [textData]);

And after that there are other occurrences where textData is used. I thought that was checked as part of https://github.com/os-autoinst/openQA/pull/6278

Actions #9

Updated by tinita about 1 month ago · Edited

https://openqa.opensuse.org/tests/4912179/details_ajax

{
  "display_title":"wait_serial",
  "is_parser_text_result":0,
  "num":158,
  "resborder":"resborder_ok",
  "result":"ok",
  "text":"consoletest_finish-160.txt",
  "text_data":null,
  "title":"wait_serial"
},

So the field text_data is present in the response from the server, but null.
In details-consoletest_finish.json it is not present at all, so the question is where does it get set, and why is it undef.

If I reproduce these conditions in a successful test, the field is actually filled on the fly with the contents of consoletest_finish-160.txt when loading the details page (OpenQA::Schema::Result::JobModules::results()). But apparently there are conditions where that code is not executed.

Actions #10

Updated by gpuliti about 1 month ago

Actions #11

Updated by livdywan about 1 month ago

  • Subject changed from Various job pages do not load due JS error to Various job pages do not load due JS error size:S
  • Description updated (diff)
  • Status changed from New to In Progress
Actions #12

Updated by tinita about 1 month ago

  • Assignee changed from gpuliti to tinita

https://github.com/os-autoinst/openQA/pull/6281 merged.
Taking over continuing searching the cause

Actions #13

Updated by tinita about 1 month ago · Edited

I looked into all testresults 04912* (roughly from March 10 14:00 - March 11 08:00)

cd /var/lib/openqa/testresults/04912
find . -name "details-*.json" | tee ~/textdata/detail-file
cat ~/textdata/detail-files | sort > ~/textdata/detail-files.sorted
for i in $(cat ~/textdata/detail-files.sorted); do echo $i >&2; cat $i | jq '.details | map(select(.text != null) | select(.text_data == null)) | length' | grep -v 0 && echo "missing: $i"; done | tee ~/textdata/missing
cat ~/textdata/missing | perl -nlwE'print "https://openqa.opensuse.org/t$1" if m{/0(\d+)-}'

It found 27 (out of 999) tests with the same symptom:

Now looking at older tests to check if this happened earlier already.

Actions #14

Updated by tinita about 1 month ago

I looked into testresults 04905 (March 6 20:00 - March 7 01:00) to see if the eval -> try/catch PR could be related:
https://github.com/os-autoinst/openQA/pull/6251

But I see it happening there as well (16 out of 994 tests).

Another PR looks related, but this is already from April 2024: https://github.com/os-autoinst/openQA/pull/5588

Looking into the actual files that should be in text_data - some of them are actually missing, some of them are present. It's about 50/50.
And if the file is present, it should actually be retrieved on the fly when viewing the job details. But in certain circumstances this is not happening.
I think I need to get the o3 database and the testresults directory to check which code is executed.
I don't know why the file is missing sometimes, and I also don't know why it is there sometimes but not written into the details-*json (I think this should be happening in finalize_results). Maybe we should write at least a debug message into the log.
And I can try to fix the case when text_data is filled on the fly.

For the database copy I first need to update my postgres instance though.

Actions #15

Updated by tinita about 1 month ago

  • Related to action #159447: logreport o3: Can't open file ".../libssh-18.txt": No such file or directory at OpenQA/Schema/Result/JobModules.pm line 113 size:M added
Actions #16

Updated by tinita about 1 month ago · Edited

I was able to reproduce it locally.
The decode('UTF-8', $text_file->slurp) simply returns undef without an error, probaly because of the control characters.

"# Result:\n\224\200udisks2.service\n"

If I try to do the same in a commandline, I get a result, but using FB_CROAK dies:

LC_ALL= LC_CTYPE="C.utf8" LANG=C.utf8 perl -wE'
use Devel::Peek;
use Data::Dumper;
use Encode;
use Mojo::File qw/ path /;
my $text_file = path(q{consoletest_finish-160.txt});
my $c = $text_file->slurp;
Dump $c;
my $txtdata = decode("UTF-8", $c);
Dump $txtdata;
$txtdata = decode("UTF-8", $c, Encode::FB_CROAK);'
SV = PV(0x55886f1d7ef0) at 0x55886f1f58a0
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x55886f77ab50 "# Result:\n\224\200udisks2.service\n"\0
  CUR = 28
  LEN = 32
  COW_REFCNT = 0
SV = PV(0x55886f1d8050) at 0x55886f5b00e8
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x55887019cc30 "# Result:\n\357\277\275\357\277\275udisks2.service\n"\0 [UTF8 "# Result:\n\x{fffd}\x{fffd}udisks2.service\n"]
  CUR = 32
  LEN = 40
  COW_REFCNT = 0
utf8 "\x94" does not map to Unicode at /usr/lib/perl5/5.26.1/x86_64-linux-thread-multi/Encode.pm line 212.

However, using FB_CROAK in the openQA code does not change it to die, it still just returns undef.

But to make it behave like before should at least be easy.

Actions #17

Updated by tinita about 1 month ago

Ooohhh... now I can see it:

# OpenQA::Schema::Result::JobModules
use Mojo::Util 'decode';

while I was trying it out with Encode::decode.

Mojo::Util::decode docs say:

Decode bytes to characters with Encode, or return "undef" if decoding
failed.

Actions #18

Updated by tinita about 1 month ago

https://github.com/os-autoinst/openQA/pull/6285 Improve reading of text_data in module results

Actions #19

Updated by tinita about 1 month ago

  • Status changed from In Progress to Feedback
Actions #20

Updated by tinita about 1 month ago

  • Status changed from Feedback to Resolved

https://github.com/os-autoinst/openQA/pull/6285 merged and deployed.

https://openqa.opensuse.org/tests/4912178#step/consoletest_finish/158 now shows: "Unable to decode consoletest_finish-160.txt."
The output in the file seems to be cut off, leading to broken unicode, so we cannot decode it.

Actions #21

Updated by favogt about 1 month ago

  • Status changed from Resolved to Feedback

https://openqa.opensuse.org/tests/4912178#step/consoletest_finish/158 now shows: "Unable to decode consoletest_finish-160.txt."
The output in the file seems to be cut off, leading to broken unicode, so we cannot decode it.

The reason for that is the 4096 B ring buffer in serial_screen.pm's read_until. By default, record_output => 0, which means that on a match only 4096 B are returned, which might severe a UTF-8 sequence.

IMO either:

  • Call wait_serial with record_output => 1 in read_until
  • Try to keep UTF-8 intact (can happen on both ends of the buffer)
  • Handle UTF-8 decode errors gracefully by using replacement characters instead of failing to decode everything

The current state means that the output is not visible without digging through the corresponding full serial log.

Actions #22

Updated by tinita about 1 month ago

  • Subject changed from Various job pages do not load due JS error size:S to Various job pages do not load due JS error
  • Status changed from Feedback to New
  • Assignee deleted (tinita)
Actions #23

Updated by tinita about 1 month ago

  • Category changed from Regressions/Crashes to Feature requests
Actions #24

Updated by tinita about 1 month ago

  • Copied to action #178855: Handle broken unicode in files from wait_serial results added
Actions #25

Updated by tinita about 1 month ago

  • Category changed from Feature requests to Regressions/Crashes
  • Status changed from New to Resolved
  • Assignee set to tinita

I actually coped it into a new ticket #178855, as the Javascript error was a regression, but the content of such files was never shown before.

Actions

Also available in: Atom PDF