action #64857
closed
coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues
coordination #62420: [epic] Distinguish all types of incompletes
Put single-line error messages into incomplete reason for "died"
Added by okurz almost 5 years ago.
Updated over 4 years ago.
Category:
Feature requests
Description
Motivation¶
See #62420#note-17 . In case of incompletes with reason "died" we already put the message "terminated prematurely, see log output for details". Also we output the logs from autoinst-log.txt directly on the "Details" tab when there are no module results. For single simple error messages, e.g. especially single-line error strings from the backend, we should put that string into the reason instead of "see log output for details" to make the error information more easily accessible to both the web UI users as well as readable over API.
Acceptance criteria¶
- AC1: Single line error messages from the backend are put in reason and accessible over job API as well as web UI
- AC2: In case the error is not obvious or for multi-line error messages the reason is still "died: terminated prematurely, see log output for details"
Suggestions¶
- Research how error messages are created by os-autoinst backends, e.g. qemu
- If necessary put error message not only in autoinst-log.txt but accessible to openQA, e.g. in one of the json files read by openQA
- in openQA where we write "died: terminated prematurely, see log output for details" evaluate the length of error messages: if nr_of_lines == 1 then put in reason, else still use "died: terminated prematurely, see log output for details"
- Status changed from Workable to In Progress
- Assignee set to livdywan
- Target version set to Current Sprint
A simple approach would be: When ever isotovideo or its sub processes encounter a fatal error condition where a single-line error message is available it is written into a file in the pool directory. Subsequent errors are likely only symptoms from tearing down the whole stack and should not override the file once it is there. In the end the worker can simply check for that file and incorporate its contents into the reason.
- Status changed from In Progress to Workable
- Assignee deleted (
livdywan)
- Target version deleted (
Current Sprint)
Don't we already have a json file with the overall job result that could then be extended by one key-value with the single error line?
Was it a mistake to unassign @cdywan?
To answer your question: Yes, we could make use of the same file. The current file is updated at a single place and it might be updated from time to time. The "error file" might be created from any place and once it is there it should not be further updated to stay as close at the source of the problem as possible. Due to these differences it might still be the simplest approach to just use a new file and avoid interference between different features/aspects.
- Status changed from Workable to In Progress
- Assignee set to livdywan
- Target version set to Current Sprint
oh, yes. Honest mistake. I am sorry. Probably had the ticket open in my browser for too long without refreshing and somehow did not realize that it updated in this way. I am ok with new file if you think this is best. I trust you have better experience in this field.
cdywan wrote:
[…] And how does this relate to what's proposed in #25814#note-15
Well, what I mentioned there: "We can extend that concept to every not-passed job as long as we limit the size that we store in the database, e.g. cut reason string at a reasonable, human-readable length, e.g. 120 characters. This way we can also load the reason synchronously."
So we can keep as much data, e.g. multiple, long lines when we are talking about storing in files and job results but as soon as we parse the reason into the database, e.g. the API call from the worker, we should trim.
The PR for the openQA part has been merged.
@cdywan Have you already prepared changed for os-autoinst? I've just picked up the ticket #64884 but now I'm realizing that you're working on the same area so it would be better to implement this ticket first.
- Blocks action #64884: Distinguish test contributor errors from unexpected backend crashes added
mkittler wrote:
The PR for the openQA part has been merged.
@cdywan Have you already prepared changed for os-autoinst? I've just picked up the ticket #64884 but now I'm realizing that you're working on the same area so it would be better to implement this ticket first.
Yes. Apparently there's a failure in CI which I need to sort out (https://github.com/os-autoinst/os-autoinst/pull/1383), will look into that.
Your PR has been merged. If that's all what was left to do you can mark the ticket as resolved.
- Status changed from In Progress to Feedback
It seems to work. As mentioned in the chat I'll need to do some refactoring for #64884. So if anything else comes up I'd like to handle it as part of my changes for #64884 so we don't interfere with each other.
- Status changed from Feedback to Resolved
- Target version deleted (
Current Sprint)
As noted in #64884 (which is built on top of this feature) it works in production.
Also available in: Atom
PDF