action #95830
closedErrors from hook scripts in openqa-gru service log ("parse error: Invalid numeric literal at...", JSON error) size:M
Description
Observation¶
journalctl -u openqa-gru
on osd shows:
-- Logs begin at Sun 2021-07-18 03:30:00 CEST, end at Tue 2021-07-20 16:23:49 CEST. --
Jul 20 12:59:01 openqa openqa-gru[22516]: parse error: Invalid numeric literal at line 1, column 10
Jul 20 14:31:27 openqa openqa-gru[22516]: unable to clone job 6490922: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:29 openqa openqa-gru[22516]: unable to clone job 6490922: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:30 openqa openqa-gru[22516]: unable to clone job 6400555: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:31 openqa openqa-gru[22516]: unable to clone job 6400555: it is part of a parallel or directly chained cluster (not supported)
Jul 20 15:40:43 openqa openqa-gru[22516]: parse error: Invalid numeric literal at line 1, column 10
Jul 20 15:40:43 openqa openqa-gru[22516]: /opt/os-autoinst-scripts/openqa-investigate: line 73: echo: write error: Broken pipe
Acceptance criteria¶
- AC1: No obvious ther error messages ("parse error: Invalid numeric literal at...") in openqa-gru logs
Suggestions¶
- Try to run
openqa-investigate
scripts manually from github.com/os-autoinst/scripts/ against jobs on osd and check for errors
Further details¶
openqa-gru runs multiple asynchronous background processes. The ones mentioned above are from "job_post_hooks".
Updated by tinita about 3 years ago
- Copied from action #94678: Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) size:M added
Updated by ilausuch about 3 years ago
- Subject changed from Errors from hook scripts in openqa-gru service log ("unable to clone job", JSON error) to Errors from hook scripts in openqa-gru service log ("parse error: Invalid numeric literal at...", JSON error) size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by tinita about 3 years ago
- Status changed from Workable to In Progress
- Assignee set to tinita
Updated by tinita about 3 years ago
Updated by okurz about 3 years ago
scripts are deployed near-continuous, maybe every minute or so. So it should be live already on both o3+osd
Updated by tinita about 3 years ago
Ok, so maybe printing the whole output was a bad idea.
The log now says:
Jul 22 21:01:56 openqa openqa-gru[30446]: jq (88 /opt/os-autoinst-scripts/openqa-investigate): parse error: Invalid numeric literal at line 1, column 10 (Input: >>><!DOCTYPE html>
Jul 22 21:01:56 openqa openqa-gru[30446]: <html lang="en">
Jul 22 21:01:56 openqa openqa-gru[30446]: <head>
Jul 22 21:01:56 openqa openqa-gru[30446]: <!-- Meta, title, CSS, favicons, etc. -->
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta charset="utf-8">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta http-equiv="X-UA-Compatible" content="IE=edge">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta name="viewport" content="width=device-width, initial-scale=1">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta name="description" content="openQA is a testing framework mainly for distributions">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta name="keywords" content="Testing, Linux, Qemu">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta name="author" content="openQA contributors">
Jul 22 21:01:56 openqa openqa-gru[30446]: <title>openQA: Page not found</title>
...
That's the curl output from:
vars_last_good=$(curl -s "$host_url/tests/$last_good"/file/vars.json)
So I guess we're requesting vars.json
for a test which doesn't have a vars.json
.
What should we do in such a case? Silently skip it?
Updated by okurz about 3 years ago
Hm, actually interesting case. I would say it's one of many cases when the "not found" Page that we deliver is enormous and should be much much smaller. So maybe we tweak the not found template from openQA, maybe differ if we access as human user over the webui or other ways like here?
Updated by tinita about 3 years ago
I would say we should edit the 404 page in any case. It can simply link to a special page that shows all the routes, and the 404 page should be kept small.
But still, I think it would be better to test for a 404 status before and output a message in this case.
Additionally the jq wrapper should probably just output the first x characters.
Updated by okurz about 3 years ago
yes to all three :)
tinita wrote:
I would say we should edit the 404 page in any case. It can simply link to a special page that shows all the routes, and the 404 page should be kept small.
we could kinda turn around https://openqa.opensuse.org/api_help which is small but links to https://openqa.opensuse.org/not_found to show all routes. I think I was trying to work on the 404 page ages ago to ensure we have an API help page. Maybe at that time I could not find a better solution to just link to the default mojolicious 404 page. Maybe with combined better knowledge now we could generate a dedicated "API route help page", put that on https://openqa.opensuse.org/api_help, and make the 404 tiny instead and link to api_help
Updated by tinita about 3 years ago
First aid to prevent filling our logs and making it hard to read: https://github.com/os-autoinst/scripts/pull/91
Updated by okurz about 3 years ago
- Due date changed from 2021-07-23 to 2021-08-06
seems like you copied the due-date as well. It's a bit too tight :) So extending to be our usual 14 days in the future
Updated by tinita about 3 years ago
PR for better error handling: https://github.com/os-autoinst/scripts/pull/92
Updated by tinita about 3 years ago
- Status changed from Feedback to In Progress
PR was merged.
We still see the error message (and others) in the log though:
Jul 27 09:22:15 openqa openqa-gru[2105]: unable to clone job 6578645: it is part of a parallel or directly chained cluster (not supported)
Jul 27 09:49:53 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:49:53 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:49:56 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:50:04 openqa openqa-gru[2105]: parse error: Invalid numeric literal at line 1, column 10
Jul 27 09:50:09 openqa openqa-gru[2105]: parse error: Invalid numeric literal at line 1, column 10
Jul 27 09:50:10 openqa openqa-gru[2105]: parse error: Invalid numeric literal at line 1, column 10
Jul 27 09:50:10 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:50:17 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:50:21 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:51:50 openqa openqa-gru[2105]: grep: exceeded PCRE's backtracking limit
Jul 27 09:51:50 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:52:11 openqa openqa-gru[2105]: grep: exceeded PCRE's backtracking limit
Also we now see a reason why the curl command returned a 404:
Jul 27 10:04:24 openqa openqa-gru[2105]: curl: Error fetching url (-s https://openqa.suse.de/tests/null/file/vars.json): Got Status 40
Updated by okurz about 3 years ago
tinita wrote:
Jul 27 10:04:24 openqa openqa-gru[2105]: curl: Error fetching url (-s https://openqa.suse.de/tests/null/file/vars.json): Got Status 40
the URL part "null" is likely already coming from jq
Updated by tinita about 3 years ago
okurz wrote:
the URL part "null" is likely already coming from jq
That's pretty obvious, yeah
Updated by tinita about 3 years ago
looks like this line:
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Schema/Result/Jobs.pm#L2019
sub investigate {
...
$inv{last_good} //= 'not found';
OTOH this is already checked: if [[ "$last_good_exists" = "not found" ]]; then ...
Updated by tinita about 3 years ago
PR for the null URL https://openqa.suse.de/tests/null/file/vars.json
: https://github.com/os-autoinst/scripts/pull/94
Updated by tinita about 3 years ago
Regarding the jq error: Do we have a list of the scripts where the output ends up in this log? All scripts in the os-autoinst/scripts repo?
I would suggest to share the functions runcurl
and runjq
between them.
Updated by tinita about 3 years ago
PR 94 merged, waiting for more informative output in the logs.
Updated by tinita about 3 years ago
Jul 27 17:03:10 openqa openqa-gru[2105]: .last_good.text not found: investigation for test 6584829 returned '{"error":"No previous job in this scenario, cannot provide hints"}'
Updated by tinita about 3 years ago
Updated by tinita about 3 years ago
- Copied to action #96122: Errors from hook scripts in openqa-gru service log ("grep: range out of order in character class") size:M added
Updated by ilausuch about 3 years ago
During the daily we commented that we'll should wait to see several days of logs to check it
Updated by tinita about 3 years ago
- Status changed from In Progress to Feedback
Another PR for the remaining jq errors: https://github.com/os-autoinst/scripts/pull/98
(although I don't see any such errors in the logs from the last days)
It also contains general improvements.
Updated by tinita about 3 years ago
- Status changed from Feedback to Resolved
The PR was merged. Currently I don't see such error messages in the log.
It seems to be rare, so I guess I will resolve the ticket, and if it happens again we will either see a jq error with more information or a message that curl didn't get a 200 back.