action #94678
closedMultiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) size:M
0%
Description
Observation¶
Seems like since some days there are no openqa-investigate jobs triggered anymore, e.g. see on openqa.suse.de/tests/ where normally we should see many now I don't see any. o3 works fine, compare https://openqa.suse.de/tests/?match=:investigate: and https://openqa.opensuse.org/tests/?match=:investigate: to look for jobs with ":investigate:" in the name.
journalctl -u openqa-gru
on osd shows:
Jun 24 15:57:01 openqa openqa-gru[843]: [2021-06-24T15:56:58.411 CEST] [debug] git fetch: remote: Total 0 (delta 0), reused 0 (delta 0), pack-r>
Jun 24 15:57:01 openqa openqa-gru[843]:
Jun 24 15:57:01 openqa openqa-gru[843]: Could not find '07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed' in complete history at /usr/lib/os-autoinst/O>
Jun 24 15:57:01 openqa openqa-gru[843]: ---
Jun 24 15:57:01 openqa openqa-gru[843]: 1 unknown issues to be reviewed:
Jun 24 15:57:01 openqa openqa-gru[843]: - https://openqa.suse.de/tests/6325089 isotovideo died: Could not find '07a51bb2ffb67f4fc
Jun 24 15:57:03 openqa openqa-gru[843]: https://openqa.suse.de/tests/6325091 : Unknown issue, to be reviewed -> https://openqa.suse.de/tests/63>
Jun 24 15:57:03 openqa openqa-gru[843]: Likely the error is within this log excerpt, last lines before shutdown:
Jun 24 15:57:03 openqa openqa-gru[843]: ---
Jun 24 15:57:03 openqa openqa-gru[843]: [2.0K blob data]
Jun 24 15:57:03 openqa openqa-gru[843]: [1.2K blob data]
Jun 24 15:57:03 openqa openqa-gru[843]:
Jun 24 15:57:03 openqa openqa-gru[843]: [2021-06-24T15:56:59.337 CEST] [debug] git fetch: remote: Total 0 (delta 0), reused 0 (delta 0), pack-r>
Jun 24 15:57:03 openqa openqa-gru[843]:
Jun 24 15:57:03 openqa openqa-gru[843]: Could not find '07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed' in complete history at /usr/lib/os-autoinst/O>
Jun 24 15:57:03 openqa openqa-gru[843]: ---
Jun 24 15:57:03 openqa openqa-gru[843]: 1 unknown issues to be reviewed:
Jun 24 15:57:03 openqa openqa-gru[843]: - https://openqa.suse.de/tests/6325091 isotovideo died: Could not find '07a51bb2ffb67f4fc
Jun 24 16:32:15 openqa openqa-gru[843]: grep: exceeded PCRE's backtracking limit
Jun 24 18:59:10 openqa openqa-gru[843]: unable to clone job 6324478: it is part of a parallel or directly chained cluster (not supported)
Jun 24 18:59:11 openqa openqa-gru[843]: unable to clone job 6324478: it is part of a parallel or directly chained cluster (not supported)
Acceptance criteria¶
- AC1: No obvious error messages in openqa-gru logs
AC2: openqa-investigate jobs are triggered again on osd
Suggestions¶
- Check for failed minion jobs on https://openqa.suse.de/minion/jobs?state=failed
- Try to run
openqa-investigate
scripts manually from github.com/os-autoinst/scripts/ against jobs on osd and check for errors
Further details¶
openqa-gru runs multiple asynchronous background processes. The ones mentioned above are from "job_post_hooks".
Updated by okurz over 3 years ago
- Subject changed from Missing openqa-investigate jobs on osd to Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd)
Interesting enough I found "openqa-investigate" jobs by magic again today on osd but there are still errors
Updated by livdywan over 3 years ago
- Subject changed from Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) to Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) size:M
Updated by okurz over 3 years ago
We found out that the error
Could not find '07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed' in complete history at /usr/lib/os-autoinst/O>
Jun 24 15:57:01 openqa openqa-gru[843]: ---
Jun 24 15:57:01 openqa openqa-gru[843]: 1 unknown issues to be reviewed:
Jun 24 15:57:01 openqa openqa-gru[843]: - https://openqa.suse.de/tests/6325089
is also visible in https://openqa.suse.de/tests/6325089 as a reason. That's good so far. The git hash 07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed only exists in git://github.com/dzedro/os-autoinst-distri-opensuse.git . Likely openqa-investigate looks only in the main repo git@github.com:os-autoinst/os-autoinst-distri-opensuse.git . We can improve the error message to at least tell us from which repo we start.
Updated by tinita over 3 years ago
- Status changed from Workable to In Progress
- Assignee set to tinita
Updated by tinita over 3 years ago
I created a PR to add the git url to the error message.
What I'm wondering about is that some of the git operation output should land in the log, but we don't see anything like that in the mentioned test.
Updated by openqa_review over 3 years ago
- Due date set to 2021-07-23
Setting due date based on mean cycle time of SUSE QE Tools
Updated by tinita over 3 years ago
- Status changed from In Progress to Feedback
PR https://github.com/os-autoinst/os-autoinst/pull/1718 was merged
Updated by tinita over 3 years ago
How can I check the openqa-gru logs from the last days? No matter what date I provide, it always starts today:
% journalctl --since=2021-07-10 -u openqa-gru
-- Logs begin at Tue 2021-07-13 11:08:07 CEST, end at Tue 2021-07-13 14:31:21 CEST. --
...
Updated by okurz over 3 years ago
Likely the journal size is configured too low and there are too many messages in various units to preserve logs for more than one day
Updated by tinita over 3 years ago
- Status changed from Feedback to Workable
- Assignee deleted (
tinita) - Priority changed from High to Normal
I don't know how to check reliably that entries like this aren't in the log anymore because we don't have enough log lines available.
Maybe someone else has an idea.
Updated by okurz over 3 years ago
- Priority changed from Normal to Urgent
With the command mentioned in the original description ssh osd 'sudo journalctl -u openqa-gru'
I can find:
-- Logs begin at Sun 2021-07-18 03:30:00 CEST, end at Tue 2021-07-20 16:23:49 CEST. --
Jul 20 12:59:01 openqa openqa-gru[22516]: parse error: Invalid numeric literal at line 1, column 10
Jul 20 14:31:27 openqa openqa-gru[22516]: unable to clone job 6490922: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:29 openqa openqa-gru[22516]: unable to clone job 6490922: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:30 openqa openqa-gru[22516]: unable to clone job 6400555: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:31 openqa openqa-gru[22516]: unable to clone job 6400555: it is part of a parallel or directly chained cluster (not supported)
Jul 20 15:40:43 openqa openqa-gru[22516]: parse error: Invalid numeric literal at line 1, column 10
Jul 20 15:40:43 openqa openqa-gru[22516]: /opt/os-autoinst-scripts/openqa-investigate: line 73: echo: write error: Broken pipe
so logs go back three days but only since this day there are useful entries. I consider parse error: Invalid numeric literal at line 1, column 10
as well as /opt/os-autoinst-scripts/openqa-investigate: line 73: echo: write error: Broken pipe
"obvious error messages".
With the due-date approaching we need to handle this with priority
Updated by tinita over 3 years ago
So since there are new errors not appearing in the original description, this is an ongoing task?
Then it should not be labeled as M.
Each error should be handled individually in a ticket, I guess
Updated by okurz over 3 years ago
Sure, I don't mind either way. You can split the ticket to have a new one for the errors that I mentioned and assign back to this one and resolve.
Updated by mkittler over 3 years ago
69 if echo "$test_log" | grep -q "No.*changes recorded"; then
70 echo "$test_log. Skipping test regression investigation job."
71 last_good_tests=''
72 else
73 vars_last_good=$(curl -s "$host_url/tests/$last_good"/file/vars.json)
74 last_good_tests=$(echo "$vars_last_good" | jq -r '.TEST_GIT_HASH')
So we cannot get curl
's output? The error message logged before looks more like the jq
command in line 74 would be failing.
Updated by tinita over 3 years ago
- Copied to action #95830: Errors from hook scripts in openqa-gru service log ("parse error: Invalid numeric literal at...", JSON error) size:M added
Updated by tinita over 3 years ago
- Status changed from Workable to Resolved
The error Could not find '07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed' in complete history
was handled in a PR. Other error messages are handled in other tickets (https://progress.opensuse.org/issues/95830)
Updated by tinita over 3 years ago
I did find this error today:
Jul 23 13:38:21 openqa openqa-gru[32521]: Could not find '424979a5d88e8d43150de513e4838a55c64b8423' in complete history in cloned Git repository 'https://github.com/dzedro/os-autoinst-distri-opensuse.git#424979a5d88e8d43150de513e4838a55c64b8423' at /usr/lib/os-autoinst/OpenQA/Isotovideo/Utils.pm line 96.
Jul 23 13:38:21 openqa openqa-gru[32521]: ---
Jul 23 13:38:21 openqa openqa-gru[32521]: 1 unknown issues to be reviewed:
Jul 23 13:38:21 openqa openqa-gru[32521]: - https://openqa.suse.de/tests/6536334 isotovideo died: Could not find '424979a5d88e8d431
Looking at the repo, this commit does indeed not exist in the fork, only in the origin so far.
https://github.com/dzedro/os-autoinst-distri-opensuse/commit/424979a5d88e8d43150de513e4838a55c64b8423
"This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."