Project

General

Profile

Actions

action #96191

closed

Provide "fail-rate" of tests, especially multi-machine, in grafana size:M

Added by okurz almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-07-28
Due date:
2021-09-29
% Done:

0%

Estimated time:

Description

Motivation

The hypothesis was raised that "multimachine jobs have decreased reliability since ~2 weeks (2 nodes). More nodes are even worse." Maybe true, maybe not. We should be able to calculate a fail-ratio for different categories of openQA tests, e.g. in grafana based on SQL queries. With this we would be able to support/reject the hypothesis.

Suggestion

  • See what grafana data we have, or SQL queries, extend as needed
  • Consider mm versus "normal" tests
  • Focus on failed start with - we already deal with incompletes
  • Exclude retried jobs since those don't run for mm

Related issues 4 (0 open4 closed)

Related to openQA Project - coordination #96185: [epic] Multimachine failure rate increasedResolvedokurz2021-07-29

Actions
Related to openQA Project - action #98604: Provide data about ratio of automatically approved SLE Maintenance incidents size:MResolvedVANASTASIADIS2021-09-142021-10-12

Actions
Copied to openQA Project - action #99135: Provide ratio of tests by result in monitoring - by workerResolvedokurz

Actions
Copied to openQA Project - action #102428: Provide "fail-rate" alerting with ratio_mm_failed 5.360 size:MResolvedkraih2021-07-282021-12-07

Actions
Actions

Also available in: Atom PDF