Project

General

Profile

action #64857

coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

coordination #62420: [epic] Distinguish all types of incompletes

Put single-line error messages into incomplete reason for "died"

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2020-03-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

See #62420#note-17 . In case of incompletes with reason "died" we already put the message "terminated prematurely, see log output for details". Also we output the logs from autoinst-log.txt directly on the "Details" tab when there are no module results. For single simple error messages, e.g. especially single-line error strings from the backend, we should put that string into the reason instead of "see log output for details" to make the error information more easily accessible to both the web UI users as well as readable over API.

Acceptance criteria

  • AC1: Single line error messages from the backend are put in reason and accessible over job API as well as web UI
  • AC2: In case the error is not obvious or for multi-line error messages the reason is still "died: terminated prematurely, see log output for details"

Suggestions

  • Research how error messages are created by os-autoinst backends, e.g. qemu
  • If necessary put error message not only in autoinst-log.txt but accessible to openQA, e.g. in one of the json files read by openQA
  • in openQA where we write "died: terminated prematurely, see log output for details" evaluate the length of error messages: if nr_of_lines == 1 then put in reason, else still use "died: terminated prematurely, see log output for details"

Related issues

Blocks openQA Project - action #64884: Distinguish test contributor errors from unexpected backend crashesResolved2020-03-26

History

#1 Updated by cdywan about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to cdywan
  • Target version set to Current Sprint

#2 Updated by mkittler about 1 year ago

A simple approach would be: When ever isotovideo or its sub processes encounter a fatal error condition where a single-line error message is available it is written into a file in the pool directory. Subsequent errors are likely only symptoms from tearing down the whole stack and should not override the file once it is there. In the end the worker can simply check for that file and incorporate its contents into the reason.

#3 Updated by okurz about 1 year ago

  • Status changed from In Progress to Workable
  • Assignee deleted (cdywan)
  • Target version deleted (Current Sprint)

Don't we already have a json file with the overall job result that could then be extended by one key-value with the single error line?

#4 Updated by mkittler about 1 year ago

Was it a mistake to unassign cdywan?

To answer your question: Yes, we could make use of the same file. The current file is updated at a single place and it might be updated from time to time. The "error file" might be created from any place and once it is there it should not be further updated to stay as close at the source of the problem as possible. Due to these differences it might still be the simplest approach to just use a new file and avoid interference between different features/aspects.

#5 Updated by okurz about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to cdywan
  • Target version set to Current Sprint

oh, yes. Honest mistake. I am sorry. Probably had the ticket open in my browser for too long without refreshing and somehow did not realize that it updated in this way. I am ok with new file if you think this is best. I trust you have better experience in this field.

#7 Updated by cdywan about 1 year ago

https://github.com/os-autoinst/openQA/pull/2924 OpenQA part here

I do wonder where we want to enforce this being a single line. Do we want to check the number of lines when reading the reason? And how does this relate to what's proposed in #25814#note-15

#8 Updated by okurz about 1 year ago

cdywan wrote:

[…] And how does this relate to what's proposed in #25814#note-15

Well, what I mentioned there: "We can extend that concept to every not-passed job as long as we limit the size that we store in the database, e.g. cut reason string at a reasonable, human-readable length, e.g. 120 characters. This way we can also load the reason synchronously."

So we can keep as much data, e.g. multiple, long lines when we are talking about storing in files and job results but as soon as we parse the reason into the database, e.g. the API call from the worker, we should trim.

#9 Updated by mkittler about 1 year ago

The PR for the openQA part has been merged.

cdywan Have you already prepared changed for os-autoinst? I've just picked up the ticket #64884 but now I'm realizing that you're working on the same area so it would be better to implement this ticket first.

#10 Updated by mkittler about 1 year ago

  • Blocks action #64884: Distinguish test contributor errors from unexpected backend crashes added

#11 Updated by cdywan about 1 year ago

mkittler wrote:

The PR for the openQA part has been merged.

cdywan Have you already prepared changed for os-autoinst? I've just picked up the ticket #64884 but now I'm realizing that you're working on the same area so it would be better to implement this ticket first.

Yes. Apparently there's a failure in CI which I need to sort out (https://github.com/os-autoinst/os-autoinst/pull/1383), will look into that.

#12 Updated by mkittler about 1 year ago

Your PR has been merged. If that's all what was left to do you can mark the ticket as resolved.

#13 Updated by cdywan about 1 year ago

  • Status changed from In Progress to Feedback

#14 Updated by mkittler about 1 year ago

It seems to work. As mentioned in the chat I'll need to do some refactoring for #64884. So if anything else comes up I'd like to handle it as part of my changes for #64884 so we don't interfere with each other.

#15 Updated by mkittler about 1 year ago

  • Status changed from Feedback to Resolved
  • Target version deleted (Current Sprint)

As noted in #64884 (which is built on top of this feature) it works in production.

Also available in: Atom PDF