Project

General

Profile

action #75388

coordination #39719: [saga][epic] Detect "known failures" and mark jobs as such to make tests more stable, reviewing test results and tracking known issues easier

coordination #62420: [epic] Distinguish all types of incompletes

Explicit error feedback to test reviewers on wrong test API usage

Added by okurz 6 months ago. Updated about 2 months ago.

Status:
Workable
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

In cases like #73231 there is a "die" message pointing to what unexpected situation was encountered but it is not clear to unexperienced test reviewers who needs to act on this, if it is a test maintainer or an instance admin or openQA developer. We should provide clear error feedback on wrong test API usage and distinguish from other error cases

Acceptance criteria

  • AC1: The "reason" field in openQA jobs makes it clear to readers if the test API was used in a wrong way, e.g. as in the case of #73231
  • AC2: os-autoinst contribution documentation explains how wrong test API usage can be distinguished from other error cases

Suggestions

  • Research existing "die" messages in os-autoinst
  • Explore how these die messages are displayed in openQA
  • Find a way to distinguish, e.g. a clear prefix in os-autoinst or exception vs. generic "die", custom die handler, etc.
  • Extend os-autoinst contribution documentation with hints or guidelines what to use in which case

Related issues

Copied from openQA Project - action #73231: [microos]job incompletes with auto_review:"backend died: Virtio terminal and svirt serial terminal do not support send_key. Use"Resolved2020-10-12

History

#1 Updated by okurz 6 months ago

  • Copied from action #73231: [microos]job incompletes with auto_review:"backend died: Virtio terminal and svirt serial terminal do not support send_key. Use" added

#2 Updated by okurz 6 months ago

  • Parent task set to #62420

#3 Updated by Xiaojing_liu 5 months ago

  • Status changed from Workable to In Progress
  • Assignee set to Xiaojing_liu

#4 Updated by Xiaojing_liu 5 months ago

I researched the existing 'die' and 'croak' in os-autoinst. I only find 2 places are about the test API is used in the wrong way:

  1. in consoles/serial_screen.pm, sub send_key, sub hold_key and so on.
  2. in testapi.pm, sub validate_script_output checks the input parameters and say croak "Invalid use of validate_script_output(), second arg must be a coderef or regexp"; And 1 is about #73231, the job will be treated as incomplete, the reason is backend die. And about 2, the job will be shown as failed, so the reason is shown in the job details. # Test died: Invalid use of validate_script_output(), second ...

I am not sure if I miss something or I am in the right direction.

About Suggestions 3, and 4:
a. Adding prefix in die message seems is the easiest way, is it enough?
b. Do we need to deal with the job result? for example if the reason matches the prefix, the job status is still incomplete? or give more details? such as a link to documentation?

#5 Updated by cdywan 5 months ago

Xiaojing_liu wrote:

About Suggestions 3, and 4:
a. Adding prefix in die message seems is the easiest way, is it enough?
b. Do we need to deal with the job result? for example if the reason matches the prefix, the job status is still incomplete? or give more details? such as a link to documentation?

Do you think we can have a consistent prefix and job state? For example always make it Invalid API usage: validate_script_output and mark the job as Incomplete?

#6 Updated by Xiaojing_liu 5 months ago

cdywan wrote:

Xiaojing_liu wrote:

About Suggestions 3, and 4:
a. Adding prefix in die message seems is the easiest way, is it enough?
b. Do we need to deal with the job result? for example if the reason matches the prefix, the job status is still incomplete? or give more details? such as a link to documentation?

Do you think we can have a consistent prefix and job state? For example always make it Invalid API usage: validate_script_output and mark the job as Incomplete?

Yes, I think we can have a prefix in scenario #73231. And the job is already Incomplete. Do we need to show backend died in the reason? For now, the reason is backend died: Virtio terminal and svirt serial terminal do not support send_key.

okurz WDYT? do I miss something according to your suggestions on this ticket?

#7 Updated by Xiaojing_liu 5 months ago

  • Due date set to 2020-11-13

#8 Updated by okurz 5 months ago

  • Description updated (diff)

Xiaojing_liu wrote:

Yes, I think we can have a prefix in scenario #73231. And the job is already Incomplete. Do we need to show backend died in the reason? For now, the reason is backend died: Virtio terminal and svirt serial terminal do not support send_key.

We should not show "backend died" in the reason in case of wrong test API usage as I can imagine that "backend died" is understood as "something crashed". But it should be clear that the test writers need to act. We do not need to care or change the job state or result at this point. But if you think it is easier to follow the approach as in sub validate_script_output then try that.

okurz WDYT? do I miss something according to your suggestions on this ticket?

Well, only the "contribution documentation" but that can also be treated separately.

#10 Updated by Xiaojing_liu 5 months ago

  • Due date changed from 2020-11-13 to 2020-11-20

#11 Updated by okurz 5 months ago

Hi, I suggest to update the due-date always with a comment. Our target should be to not exceed the mean cycle time which is currently at 14 days. Hence a due date of tomorrow would make sense. Considering that this often involves feedback from production and hence needs a merged PR that is deployed this is unlikely to be doable. Do you see a possibility to split this ticket? For learning purposes I think it would be also ok, maybe even beneficial, to unassign again and start a different ticket.

#12 Updated by Xiaojing_liu 5 months ago

  • Due date deleted (2020-11-20)
  • Status changed from In Progress to Workable
  • Assignee deleted (Xiaojing_liu)

After getting the feedback from cdywan and okurz, I did not work on this ticket since last Wednesday.
I just had a thought about it:

  1. could we define a hash or array to store this special error message?
  2. Following 1, then we could modify the code die_handler in https://github.com/os-autoinst/os-autoinst/blob/master/backend/baseclass.pm#L92 to show these message as test died or something else not backend die. Not sure if I am in the right way. Since this ticket's due date, I unassign it.

#13 Updated by okurz about 2 months ago

  • Target version changed from Ready to future

Also available in: Atom PDF