action #95830
closed
Errors from hook scripts in openqa-gru service log ("parse error: Invalid numeric literal at...", JSON error) size:M
Added by tinita over 3 years ago.
Updated over 3 years ago.
Category:
Regressions/Crashes
Description
Observation¶
journalctl -u openqa-gru
on osd shows:
-- Logs begin at Sun 2021-07-18 03:30:00 CEST, end at Tue 2021-07-20 16:23:49 CEST. --
Jul 20 12:59:01 openqa openqa-gru[22516]: parse error: Invalid numeric literal at line 1, column 10
Jul 20 14:31:27 openqa openqa-gru[22516]: unable to clone job 6490922: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:29 openqa openqa-gru[22516]: unable to clone job 6490922: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:30 openqa openqa-gru[22516]: unable to clone job 6400555: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:31 openqa openqa-gru[22516]: unable to clone job 6400555: it is part of a parallel or directly chained cluster (not supported)
Jul 20 15:40:43 openqa openqa-gru[22516]: parse error: Invalid numeric literal at line 1, column 10
Jul 20 15:40:43 openqa openqa-gru[22516]: /opt/os-autoinst-scripts/openqa-investigate: line 73: echo: write error: Broken pipe
Acceptance criteria¶
- AC1: No obvious ther error messages ("parse error: Invalid numeric literal at...") in openqa-gru logs
Suggestions¶
- Try to run
openqa-investigate
scripts manually from github.com/os-autoinst/scripts/ against jobs on osd and check for errors
Further details¶
openqa-gru runs multiple asynchronous background processes. The ones mentioned above are from "job_post_hooks".
- Copied from action #94678: Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) size:M added
- Subject changed from Errors from hook scripts in openqa-gru service log ("unable to clone job", JSON error) to Errors from hook scripts in openqa-gru service log ("parse error: Invalid numeric literal at...", JSON error) size:M
- Description updated (diff)
- Status changed from New to Workable
- Status changed from Workable to In Progress
- Assignee set to tinita
- Status changed from In Progress to Feedback
PR merged, waiting for being deployed
scripts are deployed near-continuous, maybe every minute or so. So it should be live already on both o3+osd
Ok, so maybe printing the whole output was a bad idea.
The log now says:
Jul 22 21:01:56 openqa openqa-gru[30446]: jq (88 /opt/os-autoinst-scripts/openqa-investigate): parse error: Invalid numeric literal at line 1, column 10 (Input: >>><!DOCTYPE html>
Jul 22 21:01:56 openqa openqa-gru[30446]: <html lang="en">
Jul 22 21:01:56 openqa openqa-gru[30446]: <head>
Jul 22 21:01:56 openqa openqa-gru[30446]: <!-- Meta, title, CSS, favicons, etc. -->
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta charset="utf-8">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta http-equiv="X-UA-Compatible" content="IE=edge">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta name="viewport" content="width=device-width, initial-scale=1">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta name="description" content="openQA is a testing framework mainly for distributions">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta name="keywords" content="Testing, Linux, Qemu">
Jul 22 21:01:56 openqa openqa-gru[30446]: <meta name="author" content="openQA contributors">
Jul 22 21:01:56 openqa openqa-gru[30446]: <title>openQA: Page not found</title>
...
That's the curl output from:
vars_last_good=$(curl -s "$host_url/tests/$last_good"/file/vars.json)
So I guess we're requesting vars.json
for a test which doesn't have a vars.json
.
What should we do in such a case? Silently skip it?
Hm, actually interesting case. I would say it's one of many cases when the "not found" Page that we deliver is enormous and should be much much smaller. So maybe we tweak the not found template from openQA, maybe differ if we access as human user over the webui or other ways like here?
I would say we should edit the 404 page in any case. It can simply link to a special page that shows all the routes, and the 404 page should be kept small.
But still, I think it would be better to test for a 404 status before and output a message in this case.
Additionally the jq wrapper should probably just output the first x characters.
yes to all three :)
tinita wrote:
I would say we should edit the 404 page in any case. It can simply link to a special page that shows all the routes, and the 404 page should be kept small.
we could kinda turn around https://openqa.opensuse.org/api_help which is small but links to https://openqa.opensuse.org/not_found to show all routes. I think I was trying to work on the 404 page ages ago to ensure we have an API help page. Maybe at that time I could not find a better solution to just link to the default mojolicious 404 page. Maybe with combined better knowledge now we could generate a dedicated "API route help page", put that on https://openqa.opensuse.org/api_help, and make the 404 tiny instead and link to api_help
- Due date changed from 2021-07-23 to 2021-08-06
seems like you copied the due-date as well. It's a bit too tight :) So extending to be our usual 14 days in the future
- Status changed from Feedback to In Progress
PR was merged.
We still see the error message (and others) in the log though:
Jul 27 09:22:15 openqa openqa-gru[2105]: unable to clone job 6578645: it is part of a parallel or directly chained cluster (not supported)
Jul 27 09:49:53 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:49:53 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:49:56 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:50:04 openqa openqa-gru[2105]: parse error: Invalid numeric literal at line 1, column 10
Jul 27 09:50:09 openqa openqa-gru[2105]: parse error: Invalid numeric literal at line 1, column 10
Jul 27 09:50:10 openqa openqa-gru[2105]: parse error: Invalid numeric literal at line 1, column 10
Jul 27 09:50:10 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:50:17 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:50:21 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:51:50 openqa openqa-gru[2105]: grep: exceeded PCRE's backtracking limit
Jul 27 09:51:50 openqa openqa-gru[2105]: grep: range out of order in character class
Jul 27 09:52:11 openqa openqa-gru[2105]: grep: exceeded PCRE's backtracking limit
Also we now see a reason why the curl command returned a 404:
Jul 27 10:04:24 openqa openqa-gru[2105]: curl: Error fetching url (-s https://openqa.suse.de/tests/null/file/vars.json): Got Status 40
okurz wrote:
the URL part "null" is likely already coming from jq
That's pretty obvious, yeah
Regarding the jq error: Do we have a list of the scripts where the output ends up in this log? All scripts in the os-autoinst/scripts repo?
I would suggest to share the functions runcurl
and runjq
between them.
PR 94 merged, waiting for more informative output in the logs.
Jul 27 17:03:10 openqa openqa-gru[2105]: .last_good.text not found: investigation for test 6584829 returned '{"error":"No previous job in this scenario, cannot provide hints"}'
- Copied to action #96122: Errors from hook scripts in openqa-gru service log ("grep: range out of order in character class") size:M added
During the daily we commented that we'll should wait to see several days of logs to check it
- Status changed from In Progress to Feedback
- Status changed from Feedback to Resolved
The PR was merged. Currently I don't see such error messages in the log.
It seems to be rare, so I guess I will resolve the ticket, and if it happens again we will either see a jq error with more information or a message that curl didn't get a 200 back.
Also available in: Atom
PDF