action #178597
closedVarious job pages do not load due JS error
Description
I'm trying to review test results for Leap 15.6 Images, but the details
pages for most failed jobs do not load:
https://openqa.opensuse.org/tests/4912178
https://openqa.opensuse.org/tests/4912179
https://openqa.opensuse.org/tests/4912180
They only show
Unable to load test modules: TypeError: Cannot read properties of null (reading 'replace')
Acceptance criteria¶
AC1: JS errors don't prevent details from loading
AC2: It is known why this is starting to happen now
Suggestions¶
- Look in
assets/javascripts/render.js
function renderModuleRow
fortext_data
- Mitigate the error by making JS more forgiving if the variable is undef
- This looks to be specific to 15.6 builds - other failing jobs don't seem to be affected
- Consider showing a warning if fields are missing OR just use the default text for that purpose
Updated by gpuliti about 1 month ago · Edited
- Assignee set to gpuliti
Working on mitigation: https://github.com/os-autoinst/openQA/pull/6278
Updated by tinita about 1 month ago
Looking into /var/lib/openqa/testresults/04912/04912178-opensuse-15.6-Rescue-CD-x86_64-Build15.29-rescue@64bit-2G/details-*.json
, there are some files where text_data
is not set.
Updated by tinita about 1 month ago
Actually there is only one occurrence in /var/lib/openqa/testresults/04912/04912178-opensuse-15.6-Rescue-CD-x86_64-Build15.29-rescue@64bit-2G
where text_data is not set: details-consoletest_finish.json
{"result":"ok","text":"consoletest_finish-160.txt","title":"wait_serial"}
And the file consoletest_finish-160.txt
is in the directory, so it didn't get removed because it couldn't be read for some reason. But the file looks fine, I tried $txtdata = decode('UTF-8', $txtfile->slurp)
on it, and it worked.
Updated by gpuliti about 1 month ago
- Priority changed from Urgent to High
Mitigation applied: https://github.com/os-autoinst/openQA/pull/6278
Updated by tinita about 1 month ago
We are now seeing a followup error:
https://openqa.opensuse.org/tests/4912179
Unable to load test modules: TypeError: Failed to execute 'appendChild' on 'Node': parameter 1 is not of type 'Node'.
TypeError: Failed to execute 'appendChild' on 'Node': parameter 1 is not of type 'Node'.
at createElement (test_result.js:3:6)
at renderModuleRow (test_result.js:19:224)
at renderModuleTable (test_result.js:24:19)
at Object.renderTestModules [as renderContents] (test_result.js:133:60)
at test_result.js:88:105
which points to const textresult = E('pre', [textData]);
And after that there are other occurrences where textData
is used. I thought that was checked as part of https://github.com/os-autoinst/openQA/pull/6278
Updated by tinita about 1 month ago · Edited
https://openqa.opensuse.org/tests/4912179/details_ajax
{
"display_title":"wait_serial",
"is_parser_text_result":0,
"num":158,
"resborder":"resborder_ok",
"result":"ok",
"text":"consoletest_finish-160.txt",
"text_data":null,
"title":"wait_serial"
},
So the field text_data
is present in the response from the server, but null.
In details-consoletest_finish.json
it is not present at all, so the question is where does it get set, and why is it undef.
If I reproduce these conditions in a successful test, the field is actually filled on the fly with the contents of consoletest_finish-160.txt
when loading the details page (OpenQA::Schema::Result::JobModules::results()
). But apparently there are conditions where that code is not executed.
Updated by gpuliti about 1 month ago
Second mitigation applied: https://github.com/os-autoinst/openQA/pull/6281
Updated by livdywan about 1 month ago
- Subject changed from Various job pages do not load due JS error to Various job pages do not load due JS error size:S
- Description updated (diff)
- Status changed from New to In Progress
Updated by tinita about 1 month ago
- Assignee changed from gpuliti to tinita
https://github.com/os-autoinst/openQA/pull/6281 merged.
Taking over continuing searching the cause
Updated by tinita about 1 month ago · Edited
I looked into all testresults 04912* (roughly from March 10 14:00 - March 11 08:00)
cd /var/lib/openqa/testresults/04912
find . -name "details-*.json" | tee ~/textdata/detail-file
cat ~/textdata/detail-files | sort > ~/textdata/detail-files.sorted
for i in $(cat ~/textdata/detail-files.sorted); do echo $i >&2; cat $i | jq '.details | map(select(.text != null) | select(.text_data == null)) | length' | grep -v 0 && echo "missing: $i"; done | tee ~/textdata/missing
cat ~/textdata/missing | perl -nlwE'print "https://openqa.opensuse.org/t$1" if m{/0(\d+)-}'
It found 27 (out of 999) tests with the same symptom:
- https://openqa.opensuse.org/t4912025
- https://openqa.opensuse.org/t4912178
- https://openqa.opensuse.org/t4912179
- https://openqa.opensuse.org/t4912180
- https://openqa.opensuse.org/t4912213
- https://openqa.opensuse.org/t4912215
- https://openqa.opensuse.org/t4912216
- https://openqa.opensuse.org/t4912248
- https://openqa.opensuse.org/t4912249
- https://openqa.opensuse.org/t4912286
- https://openqa.opensuse.org/t4912289
- https://openqa.opensuse.org/t4912290
- https://openqa.opensuse.org/t4912330
- https://openqa.opensuse.org/t4912536
- https://openqa.opensuse.org/t4912545
- https://openqa.opensuse.org/t4912561
- https://openqa.opensuse.org/t4912601
- https://openqa.opensuse.org/t4912617
- https://openqa.opensuse.org/t4912695
- https://openqa.opensuse.org/t4912832
- https://openqa.opensuse.org/t4912849
- https://openqa.opensuse.org/t4912896
- https://openqa.opensuse.org/t4912898
- https://openqa.opensuse.org/t4912901
- https://openqa.opensuse.org/t4912937
- https://openqa.opensuse.org/t4912940
- https://openqa.opensuse.org/t4912960
Now looking at older tests to check if this happened earlier already.
Updated by tinita about 1 month ago
I looked into testresults 04905 (March 6 20:00 - March 7 01:00) to see if the eval -> try/catch PR could be related:
https://github.com/os-autoinst/openQA/pull/6251
But I see it happening there as well (16 out of 994 tests).
Another PR looks related, but this is already from April 2024: https://github.com/os-autoinst/openQA/pull/5588
Looking into the actual files that should be in text_data
- some of them are actually missing, some of them are present. It's about 50/50.
And if the file is present, it should actually be retrieved on the fly when viewing the job details. But in certain circumstances this is not happening.
I think I need to get the o3 database and the testresults directory to check which code is executed.
I don't know why the file is missing sometimes, and I also don't know why it is there sometimes but not written into the details-*json (I think this should be happening in finalize_results
). Maybe we should write at least a debug message into the log.
And I can try to fix the case when text_data
is filled on the fly.
For the database copy I first need to update my postgres instance though.
Updated by tinita about 1 month ago
- Related to action #159447: logreport o3: Can't open file ".../libssh-18.txt": No such file or directory at OpenQA/Schema/Result/JobModules.pm line 113 size:M added
Updated by tinita about 1 month ago · Edited
I was able to reproduce it locally.
The decode('UTF-8', $text_file->slurp)
simply returns undef without an error, probaly because of the control characters.
"# Result:\n\224\200udisks2.service\n"
If I try to do the same in a commandline, I get a result, but using FB_CROAK dies:
LC_ALL= LC_CTYPE="C.utf8" LANG=C.utf8 perl -wE'
use Devel::Peek;
use Data::Dumper;
use Encode;
use Mojo::File qw/ path /;
my $text_file = path(q{consoletest_finish-160.txt});
my $c = $text_file->slurp;
Dump $c;
my $txtdata = decode("UTF-8", $c);
Dump $txtdata;
$txtdata = decode("UTF-8", $c, Encode::FB_CROAK);'
SV = PV(0x55886f1d7ef0) at 0x55886f1f58a0
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x55886f77ab50 "# Result:\n\224\200udisks2.service\n"\0
CUR = 28
LEN = 32
COW_REFCNT = 0
SV = PV(0x55886f1d8050) at 0x55886f5b00e8
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK,UTF8)
PV = 0x55887019cc30 "# Result:\n\357\277\275\357\277\275udisks2.service\n"\0 [UTF8 "# Result:\n\x{fffd}\x{fffd}udisks2.service\n"]
CUR = 32
LEN = 40
COW_REFCNT = 0
utf8 "\x94" does not map to Unicode at /usr/lib/perl5/5.26.1/x86_64-linux-thread-multi/Encode.pm line 212.
However, using FB_CROAK in the openQA code does not change it to die, it still just returns undef.
But to make it behave like before should at least be easy.
Updated by tinita about 1 month ago
Ooohhh... now I can see it:
# OpenQA::Schema::Result::JobModules
use Mojo::Util 'decode';
while I was trying it out with Encode::decode
.
Mojo::Util::decode docs say:
Decode bytes to characters with Encode, or return "undef" if decoding
failed.
Updated by tinita about 1 month ago
https://github.com/os-autoinst/openQA/pull/6285 Improve reading of text_data in module results
Updated by tinita about 1 month ago
- Status changed from In Progress to Feedback
Updated by tinita about 1 month ago
- Status changed from Feedback to Resolved
https://github.com/os-autoinst/openQA/pull/6285 merged and deployed.
https://openqa.opensuse.org/tests/4912178#step/consoletest_finish/158 now shows: "Unable to decode consoletest_finish-160.txt."
The output in the file seems to be cut off, leading to broken unicode, so we cannot decode it.
Updated by favogt about 1 month ago
- Status changed from Resolved to Feedback
https://openqa.opensuse.org/tests/4912178#step/consoletest_finish/158 now shows: "Unable to decode consoletest_finish-160.txt."
The output in the file seems to be cut off, leading to broken unicode, so we cannot decode it.
The reason for that is the 4096 B ring buffer in serial_screen.pm's read_until
. By default, record_output => 0
, which means that on a match only 4096 B are returned, which might severe a UTF-8 sequence.
IMO either:
- Call
wait_serial
withrecord_output => 1
inread_until
- Try to keep UTF-8 intact (can happen on both ends of the buffer)
- Handle UTF-8 decode errors gracefully by using replacement characters instead of failing to decode everything
The current state means that the output is not visible without digging through the corresponding full serial log.
Updated by tinita about 1 month ago
- Subject changed from Various job pages do not load due JS error size:S to Various job pages do not load due JS error
- Status changed from Feedback to New
- Assignee deleted (
tinita)
Updated by tinita about 1 month ago
- Category changed from Regressions/Crashes to Feature requests
Updated by tinita about 1 month ago
- Copied to action #178855: Handle broken unicode in files from wait_serial results added
Updated by tinita about 1 month ago
- Category changed from Feature requests to Regressions/Crashes
- Status changed from New to Resolved
- Assignee set to tinita
I actually coped it into a new ticket #178855, as the Javascript error was a regression, but the content of such files was never shown before.