action #75388
open
coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA
coordination #102909: [epic] Prevent more incompletes already within os-autoinst or openQA
Explicit error feedback to test reviewers on wrong test API usage
Added by okurz about 4 years ago.
Updated about 3 years ago.
Category:
Feature requests
Description
Motivation¶
In cases like #73231 there is a "die" message pointing to what unexpected situation was encountered but it is not clear to unexperienced test reviewers who needs to act on this, if it is a test maintainer or an instance admin or openQA developer. We should provide clear error feedback on wrong test API usage and distinguish from other error cases
Acceptance criteria¶
- AC1: The "reason" field in openQA jobs makes it clear to readers if the test API was used in a wrong way, e.g. as in the case of #73231
- AC2: os-autoinst contribution documentation explains how wrong test API usage can be distinguished from other error cases
Suggestions¶
- Research existing "die" messages in os-autoinst
- Explore how these die messages are displayed in openQA
- Find a way to distinguish, e.g. a clear prefix in os-autoinst or exception vs. generic "die", custom die handler, etc.
- Extend os-autoinst contribution documentation with hints or guidelines what to use in which case
- Copied from action #73231: [microos]job incompletes with auto_review:"backend died: Virtio terminal and svirt serial terminal do not support send_key. Use" added
- Parent task set to #62420
- Status changed from Workable to In Progress
- Assignee set to Xiaojing_liu
I researched the existing 'die' and 'croak' in os-autoinst. I only find 2 places are about the test API is used in the wrong way:
- in
consoles/serial_screen.pm
, sub send_key
, sub hold_key
and so on.
- in
testapi.pm
, sub validate_script_output
checks the input parameters and say croak "Invalid use of validate_script_output(), second arg must be a coderef or regexp";
And 1 is about #73231, the job will be treated as incomplete
, the reason is backend die
.
And about 2, the job will be shown as failed
, so the reason is shown in the job details. # Test died: Invalid use of validate_script_output(), second ...
I am not sure if I miss something or I am in the right direction.
About Suggestions 3, and 4:
a. Adding prefix in die message seems is the easiest way, is it enough?
b. Do we need to deal with the job result? for example if the reason matches the prefix, the job status is still incomplete? or give more details? such as a link to documentation?
Xiaojing_liu wrote:
About Suggestions 3, and 4:
a. Adding prefix in die message seems is the easiest way, is it enough?
b. Do we need to deal with the job result? for example if the reason matches the prefix, the job status is still incomplete? or give more details? such as a link to documentation?
Do you think we can have a consistent prefix and job state? For example always make it Invalid API usage: validate_script_output
and mark the job as Incomplete?
cdywan wrote:
Xiaojing_liu wrote:
About Suggestions 3, and 4:
a. Adding prefix in die message seems is the easiest way, is it enough?
b. Do we need to deal with the job result? for example if the reason matches the prefix, the job status is still incomplete? or give more details? such as a link to documentation?
Do you think we can have a consistent prefix and job state? For example always make it Invalid API usage: validate_script_output
and mark the job as Incomplete?
Yes, I think we can have a prefix in scenario #73231. And the job is already Incomplete. Do we need to show backend died
in the reason? For now, the reason is backend died: Virtio terminal and svirt serial terminal do not support send_key.
@okurz WDYT? do I miss something according to your suggestions on this ticket?
- Due date set to 2020-11-13
- Description updated (diff)
Xiaojing_liu wrote:
Yes, I think we can have a prefix in scenario #73231. And the job is already Incomplete. Do we need to show backend died
in the reason? For now, the reason is backend died: Virtio terminal and svirt serial terminal do not support send_key.
We should not show "backend died" in the reason in case of wrong test API usage as I can imagine that "backend died" is understood as "something crashed". But it should be clear that the test writers need to act. We do not need to care or change the job state or result at this point. But if you think it is easier to follow the approach as in sub validate_script_output
then try that.
@okurz WDYT? do I miss something according to your suggestions on this ticket?
Well, only the "contribution documentation" but that can also be treated separately.
- Due date changed from 2020-11-13 to 2020-11-20
Hi, I suggest to update the due-date always with a comment. Our target should be to not exceed the mean cycle time which is currently at 14 days. Hence a due date of tomorrow would make sense. Considering that this often involves feedback from production and hence needs a merged PR that is deployed this is unlikely to be doable. Do you see a possibility to split this ticket? For learning purposes I think it would be also ok, maybe even beneficial, to unassign again and start a different ticket.
- Due date deleted (
2020-11-20)
- Status changed from In Progress to Workable
- Assignee deleted (
Xiaojing_liu)
After getting the feedback from @cdywan and @okurz, I did not work on this ticket since last Wednesday.
I just had a thought about it:
- could we define a hash or array to store this special error message?
- Following 1, then we could modify the code
die_handler
in https://github.com/os-autoinst/os-autoinst/blob/master/backend/baseclass.pm#L92 to show these message as test died
or something else not backend die
.
Not sure if I am in the right way. Since this ticket's due date, I unassign it.
- Target version changed from Ready to future
- Parent task changed from #62420 to #102909
Also available in: Atom
PDF