action #64884
closed
coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues
coordination #62420: [epic] Distinguish all types of incompletes
Distinguish test contributor errors from unexpected backend crashes
Added by okurz almost 5 years ago.
Updated over 4 years ago.
Category:
Feature requests
Description
Motivation¶
See #62420#note-17 . Jobs can incomplete due to test contribution errors, e.g. invalid settings or simply syntax mistakes in test code, or with unexpected backend crashes which more likely target instance admins we should separate both by the "incomplete reason".
Acceptance criteria¶
- AC1: test contributor errors, e.g. syntax mistakes in test code produce a distinct incomplete reason
- AC2: Unexpected backend crashes still yield "died: …"
Suggestions¶
- Research how syntax or compilation errors in test code are treated by isotovideo
- If necessary put error message not only in autoinst-log.txt but accessible to openQA, e.g. in one of the json files read by openQA
- in openQA where we write "died: terminated prematurely, see log output for details" distinguish if there is an error string about a known test contributor error and output according reason, else fall back to "died: …"
- Assignee set to mkittler
- Target version set to Current Sprint
- Blocked by action #64857: Put single-line error messages into incomplete reason for "died" added
- Status changed from Workable to Blocked
It makes sense to implement forwarding the reason for isotovideo to stop early first. Then we can differentiate between different causes.
- Status changed from Blocked to In Progress
- Status changed from In Progress to Feedback
With both PRs merged we should now see tests died: ...
and backend died: ...
. There might still be just died: ...
in case something else within os-autoinst failed.
Note that the PR only affects errors when loading main.pm
and the test modules. So it is mainly about compilation errors. Unhandled exceptions within the test execution were already distinguished before: The test result is set to failed and there's a failed step result with the exception message. I don't think it makes sense to duplicate that information into the "reason".
So when this works in production I consider both ACs done.
It seems to work on OSD and o3 (see select jobs.id, t_started, t_finished, workers.host as worker_host, workers.instance as worker_instance, reason from jobs join workers on assigned_worker_id=workers.id where result = 'incomplete' order by id desc limit 100;
).
The error `tests died: unable to load main.pm, check the log for the cause (e.g. syntax error)´ can now be observed on both instances.
On OSD there are several instances of backend died: No map for 'Ã\u0083' at /usr/lib/os-autoinst/consoles/VNC.pm line 741.
. Maybe we should also look into that issue but it could also be caused by network issues or problems of the VNC server.
On o3 I've only seen several instances of backend died: Migrate to file failed, it has been running for more than 240 at /usr/lib/os-autoinst/backend/qemu.pm line 258.
. I've created a PR to include the unit here. Not sure whether this error needs actual fixing.
- Status changed from Feedback to Resolved
- Target version deleted (
Current Sprint)
Also available in: Atom
PDF