Project

General

Profile

Actions

action #94678

closed

Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
2021-06-24
Due date:
2021-07-23
% Done:

0%

Estimated time:

Description

Observation

Seems like since some days there are no openqa-investigate jobs triggered anymore, e.g. see on openqa.suse.de/tests/ where normally we should see many now I don't see any. o3 works fine, compare https://openqa.suse.de/tests/?match=:investigate: and https://openqa.opensuse.org/tests/?match=:investigate: to look for jobs with ":investigate:" in the name.

journalctl -u openqa-gru on osd shows:

Jun 24 15:57:01 openqa openqa-gru[843]: [2021-06-24T15:56:58.411 CEST] [debug] git fetch: remote: Total 0 (delta 0), reused 0 (delta 0), pack-r>
Jun 24 15:57:01 openqa openqa-gru[843]:   
Jun 24 15:57:01 openqa openqa-gru[843]: Could not find '07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed' in complete history at /usr/lib/os-autoinst/O>
Jun 24 15:57:01 openqa openqa-gru[843]: ---
Jun 24 15:57:01 openqa openqa-gru[843]: 1 unknown issues to be reviewed:
Jun 24 15:57:01 openqa openqa-gru[843]:  - https://openqa.suse.de/tests/6325089 isotovideo died: Could not find '07a51bb2ffb67f4fc
Jun 24 15:57:03 openqa openqa-gru[843]: https://openqa.suse.de/tests/6325091 : Unknown issue, to be reviewed -> https://openqa.suse.de/tests/63>
Jun 24 15:57:03 openqa openqa-gru[843]: Likely the error is within this log excerpt, last lines before shutdown:
Jun 24 15:57:03 openqa openqa-gru[843]: ---
Jun 24 15:57:03 openqa openqa-gru[843]: [2.0K blob data]
Jun 24 15:57:03 openqa openqa-gru[843]: [1.2K blob data]
Jun 24 15:57:03 openqa openqa-gru[843]:   
Jun 24 15:57:03 openqa openqa-gru[843]: [2021-06-24T15:56:59.337 CEST] [debug] git fetch: remote: Total 0 (delta 0), reused 0 (delta 0), pack-r>
Jun 24 15:57:03 openqa openqa-gru[843]:   
Jun 24 15:57:03 openqa openqa-gru[843]: Could not find '07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed' in complete history at /usr/lib/os-autoinst/O>
Jun 24 15:57:03 openqa openqa-gru[843]: ---
Jun 24 15:57:03 openqa openqa-gru[843]: 1 unknown issues to be reviewed:
Jun 24 15:57:03 openqa openqa-gru[843]:  - https://openqa.suse.de/tests/6325091 isotovideo died: Could not find '07a51bb2ffb67f4fc
Jun 24 16:32:15 openqa openqa-gru[843]: grep: exceeded PCRE's backtracking limit
Jun 24 18:59:10 openqa openqa-gru[843]: unable to clone job 6324478: it is part of a parallel or directly chained cluster (not supported)
Jun 24 18:59:11 openqa openqa-gru[843]: unable to clone job 6324478: it is part of a parallel or directly chained cluster (not supported)

Acceptance criteria

  • AC1: No obvious error messages in openqa-gru logs
  • AC2: openqa-investigate jobs are triggered again on osd

Suggestions

Further details

openqa-gru runs multiple asynchronous background processes. The ones mentioned above are from "job_post_hooks".


Related issues 1 (0 open1 closed)

Copied to openQA Project (public) - action #95830: Errors from hook scripts in openqa-gru service log ("parse error: Invalid numeric literal at...", JSON error) size:MResolvedtinita2021-06-242021-08-06

Actions
Actions #1

Updated by okurz over 3 years ago

  • Subject changed from Missing openqa-investigate jobs on osd to Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd)

Interesting enough I found "openqa-investigate" jobs by magic again today on osd but there are still errors

Actions #2

Updated by okurz over 3 years ago

  • Description updated (diff)
Actions #3

Updated by okurz over 3 years ago

  • Description updated (diff)
Actions #4

Updated by okurz over 3 years ago

  • Description updated (diff)
Actions #5

Updated by livdywan over 3 years ago

  • Subject changed from Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) to Multiple errors from hook scripts in openqa-gru service log (was: Missing openqa-investigate jobs on osd) size:M
Actions #6

Updated by okurz over 3 years ago

We found out that the error

Could not find '07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed' in complete history at /usr/lib/os-autoinst/O>
Jun 24 15:57:01 openqa openqa-gru[843]: ---
Jun 24 15:57:01 openqa openqa-gru[843]: 1 unknown issues to be reviewed:
Jun 24 15:57:01 openqa openqa-gru[843]:  - https://openqa.suse.de/tests/6325089

is also visible in https://openqa.suse.de/tests/6325089 as a reason. That's good so far. The git hash 07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed only exists in git://github.com/dzedro/os-autoinst-distri-opensuse.git . Likely openqa-investigate looks only in the main repo git@github.com:os-autoinst/os-autoinst-distri-opensuse.git . We can improve the error message to at least tell us from which repo we start.

Actions #7

Updated by tinita over 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to tinita
Actions #8

Updated by tinita over 3 years ago

I created a PR to add the git url to the error message.

What I'm wondering about is that some of the git operation output should land in the log, but we don't see anything like that in the mentioned test.

Actions #9

Updated by openqa_review over 3 years ago

  • Due date set to 2021-07-23

Setting due date based on mean cycle time of SUSE QE Tools

Actions #10

Updated by tinita over 3 years ago

  • Status changed from In Progress to Feedback
Actions #11

Updated by tinita over 3 years ago

How can I check the openqa-gru logs from the last days? No matter what date I provide, it always starts today:

% journalctl  --since=2021-07-10 -u openqa-gru
-- Logs begin at Tue 2021-07-13 11:08:07 CEST, end at Tue 2021-07-13 14:31:21 CEST. --
...
Actions #12

Updated by okurz over 3 years ago

Likely the journal size is configured too low and there are too many messages in various units to preserve logs for more than one day

Actions #13

Updated by tinita over 3 years ago

  • Status changed from Feedback to Workable
  • Assignee deleted (tinita)
  • Priority changed from High to Normal

I don't know how to check reliably that entries like this aren't in the log anymore because we don't have enough log lines available.
Maybe someone else has an idea.

Actions #14

Updated by okurz over 3 years ago

  • Priority changed from Normal to Urgent

With the command mentioned in the original description ssh osd 'sudo journalctl -u openqa-gru' I can find:

-- Logs begin at Sun 2021-07-18 03:30:00 CEST, end at Tue 2021-07-20 16:23:49 CEST. --
Jul 20 12:59:01 openqa openqa-gru[22516]: parse error: Invalid numeric literal at line 1, column 10
Jul 20 14:31:27 openqa openqa-gru[22516]: unable to clone job 6490922: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:29 openqa openqa-gru[22516]: unable to clone job 6490922: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:30 openqa openqa-gru[22516]: unable to clone job 6400555: it is part of a parallel or directly chained cluster (not supported)
Jul 20 14:31:31 openqa openqa-gru[22516]: unable to clone job 6400555: it is part of a parallel or directly chained cluster (not supported)
Jul 20 15:40:43 openqa openqa-gru[22516]: parse error: Invalid numeric literal at line 1, column 10
Jul 20 15:40:43 openqa openqa-gru[22516]: /opt/os-autoinst-scripts/openqa-investigate: line 73: echo: write error: Broken pipe

so logs go back three days but only since this day there are useful entries. I consider parse error: Invalid numeric literal at line 1, column 10 as well as /opt/os-autoinst-scripts/openqa-investigate: line 73: echo: write error: Broken pipe "obvious error messages".

With the due-date approaching we need to handle this with priority

Actions #15

Updated by tinita over 3 years ago

So since there are new errors not appearing in the original description, this is an ongoing task?
Then it should not be labeled as M.
Each error should be handled individually in a ticket, I guess

Actions #16

Updated by okurz over 3 years ago

Sure, I don't mind either way. You can split the ticket to have a new one for the errors that I mentioned and assign back to this one and resolve.

Actions #17

Updated by mkittler over 3 years ago

 69     if echo "$test_log" | grep -q "No.*changes recorded"; then
 70         echo "$test_log. Skipping test regression investigation job."
 71         last_good_tests=''
 72     else
 73         vars_last_good=$(curl -s "$host_url/tests/$last_good"/file/vars.json)
 74         last_good_tests=$(echo "$vars_last_good" | jq -r '.TEST_GIT_HASH')

So we cannot get curl's output? The error message logged before looks more like the jq command in line 74 would be failing.

Actions #18

Updated by tinita over 3 years ago

  • Copied to action #95830: Errors from hook scripts in openqa-gru service log ("parse error: Invalid numeric literal at...", JSON error) size:M added
Actions #19

Updated by tinita over 3 years ago

  • Status changed from Workable to Resolved

The error Could not find '07a51bb2ffb67f4fc5bafb243e0bdcf8d2b50bed' in complete history was handled in a PR. Other error messages are handled in other tickets (https://progress.opensuse.org/issues/95830)

Actions #20

Updated by tinita over 3 years ago

I did find this error today:

Jul 23 13:38:21 openqa openqa-gru[32521]: Could not find '424979a5d88e8d43150de513e4838a55c64b8423' in complete history in cloned Git repository 'https://github.com/dzedro/os-autoinst-distri-opensuse.git#424979a5d88e8d43150de513e4838a55c64b8423' at /usr/lib/os-autoinst/OpenQA/Isotovideo/Utils.pm line 96.
Jul 23 13:38:21 openqa openqa-gru[32521]: ---
Jul 23 13:38:21 openqa openqa-gru[32521]: 1 unknown issues to be reviewed:
Jul 23 13:38:21 openqa openqa-gru[32521]:  - https://openqa.suse.de/tests/6536334 isotovideo died: Could not find '424979a5d88e8d431

Looking at the repo, this commit does indeed not exist in the fork, only in the origin so far.
https://github.com/dzedro/os-autoinst-distri-opensuse/commit/424979a5d88e8d43150de513e4838a55c64b8423
"This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

Actions

Also available in: Atom PDF