action #70972
closedfailed minion jobs with ""malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before \"(end of string)\") at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/JSON.pm line 31.\n","
Description
Observation¶
https://openqa.suse.de/minion/jobs?state=failed shows multiple occurences of failed "finalize_job_results" with details like:
{
"args" => [
4636692
],
"attempts" => 1,
"children" => [],
"created" => "2020-09-03T09:08:39.81704Z",
"delayed" => "2020-09-03T09:08:39.81704Z",
"expires" => undef,
"finished" => "2020-09-03T09:08:48.72674Z",
"id" => 551815,
"lax" => 0,
"notes" => {
"failed_modules" => {
"setpriority02" => "malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before \"(end of string)\") at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/JSON.pm line 31.\n",
"setregid02" => "malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before \"(end of string)\") at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/JSON.pm line 31.\n",
"setregid03" => "malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before \"(end of string)\") at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/JSON.pm line 31.\n",
"setresgid02" => "malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before \"(end of string)\") at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/JSON.pm line 31.\n"
},
"gru_id" => 27665779
},
"parents" => [],
"priority" => 0,
"queue" => "default",
"result" => "Finalizing results of 4 modules failed",
"retried" => undef,
"retries" => 0,
"started" => "2020-09-03T09:08:39.81939Z",
"state" => "failed",
"task" => "finalize_job_results",
"time" => "2020-09-04T11:30:21.14717Z",
"worker" => 344
}
Expected result¶
There should be no error about "malformed JSON string" failing any minion jobs to be handled by instance admins
Problem¶
Could this be a recent regression?
Further notes¶
I have deleted all "finalize_job_results" except for the first one on https://openqa.suse.de/minion/jobs?state=failed&limit=100
Updated by okurz about 4 years ago
- Related to action #70975: [alert] too many failed minion jobs added
Updated by mkittler about 4 years ago
- Status changed from New to In Progress
- Assignee set to mkittler
Could this be a recent regression?
These tasks are related to a relatively new optimization: https://github.com/os-autoinst/openQA/pull/3144
The failures are actually quite harmless. If the wasn't possible to finalize these modules the optimization has no effect but otherwise everything is as good as before.
The relevant job ID is the number in "args"
. I'll try to retrigger the job. Likely the error persists and there's something wrong with the JSON files for these modules.
Updated by mkittler about 4 years ago
As expected, the JSON files within the results dir are broken, in fact they are empty:
martchus@openqa:/var/lib/openqa/testresults/04636/04636692-sle-15-SP3-Online-x86_64-Build23.1-ltp_syscalls_baremetal@ipmi-coppi> cat details-setpriority02.json | json_pp
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/bin/json_pp line 45.
martchus@openqa:/var/lib/openqa/testresults/04636/04636692-sle-15-SP3-Online-x86_64-Build23.1-ltp_syscalls_baremetal@ipmi-coppi> cat details-setregid02.json | json_pp
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/bin/json_pp line 45.
martchus@openqa:/var/lib/openqa/testresults/04636/04636692-sle-15-SP3-Online-x86_64-Build23.1-ltp_syscalls_baremetal@ipmi-coppi> cat details-setregid03.json | json_pp
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/bin/json_pp line 45.
martchus@openqa:/var/lib/openqa/testresults/04636/04636692-sle-15-SP3-Online-x86_64-Build23.1-ltp_syscalls_baremetal@ipmi-coppi> cat details-setresgid02.json | json_pp
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/bin/json_pp line 45.
These modules are also missing within the "Details" tab. It also seems that some images and serial results are zero-size. The worker log shows multiple connection errors so the connection to the web UI wasn't stable during the upload.
We should likely avoid writing empty details JSON files in the first place. (Not sure under which circumstances these empty files have been created.) Otherwise we could also skip empty JSON files (in addition to non-existing ones) when finalizing the job.
Updated by mkittler about 4 years ago
I don't know why these files would end up empty considering openQA's code. Maybe these files have been emptied for some other reason. So I would just go for skipping empty JSON files: https://github.com/os-autoinst/openQA/pull/3365
Updated by livdywan about 4 years ago
mkittler wrote:
I don't know why these files would end up empty considering openQA's code. Maybe these files have been emptied for some other reason. So I would just go for skipping empty JSON files: https://github.com/os-autoinst/openQA/pull/3365
It seems odd considering the new file/move dance. I would suspect the writing process was interrupted. Skipping empties seems like something we should do in general, though. So +1 from me.
Updated by mkittler about 4 years ago
- Status changed from In Progress to Feedback
PR has been merged
Updated by okurz about 4 years ago
- Status changed from Feedback to Resolved
change was deployed, https://openqa.suse.de/minion/jobs?state=failed&offset=0 looks good.