action #162323
closed
no alert about multi-machine test failures 2024-06-14 size:S
Added by okurz 6 months ago.
Updated 3 months ago.
Category:
Regressions/Crashes
- Copied from action #162320: multi-machine test failures 2024-06-14+, auto_review:"ping with packet size 100 failed.*can be GRE tunnel setup issue":retry added
- Status changed from New to Rejected
- Assignee set to okurz
- Priority changed from High to Normal
- Tags set to infra, monitoring, alert, multi-machine
- Status changed from Rejected to New
- Assignee deleted (
okurz)
no, scripts-ci tests can not uncover all problems as they might not run certain worker combinations. I think an alert in grafana is helpful and would be workable for us.
- Target version changed from Ready to Tools - Next
- Subject changed from no alert about multi-machine test failures 2024-06-14+ to no alert about multi-machine test failures 2024-06-14 size:S
- Description updated (diff)
- Status changed from New to Workable
- Target version changed from Tools - Next to Ready
- Status changed from Workable to In Progress
- Assignee set to mkittler
The ratio of failed MM tests was 26 % and our alert threshold is 30 %. I could lower the threshold to e.g. 20 %. Otherwise I don't think there's anything wrong with the queries we use for alerting and they are in-line with the panel queries.
Note that there might be some confusion, though: On the graph linked in the ticket description the ratio of failed MM jobs look much higher when also considering jobs with the result parallel_failed
. It doesn't make much sense to consider those jobs but it might be something one accidentally does because parallel_failed
is also shown in red and only in a slightly different shade of red than the failed
. If the ticket was only created due to this confusion we might not want to change the alert threshold but instead use a different color for prallel_failed
in the panel (e.g. some gray).
- Status changed from In Progress to Feedback
Yes, I think we should improve the colors. Give it a try
- Status changed from Feedback to Resolved
Also available in: Atom
PDF