action #103425
closed
coordination #103962: [saga][epic] Easy multi-machine handling: MM-tests as first-class citizens
Ratio of multi-machine tests alerting with ratio_mm_failed 5.280 size:M
Added by livdywan over 3 years ago.
Updated over 3 years ago.
Category:
Regressions/Crashes
Description
Observation¶
The Ratio of multi-machine tests iswas alerting between 10.34 and 15.10(?) CET today:
ratio_mm_failed 5.280
Acceptance criteria¶
- AC1: Thresholds for ratio_mm_failed are tuned based on concrete data
- AC2: Investigation advice in alert contains concrete steps
Suggestion¶
- Copied from action #102428: Provide "fail-rate" alerting with ratio_mm_failed 5.360 size:M added
- Description updated (diff)
And alterting again right now:
ratio_mm_failed 5.910
cdywan wrote:
And alterting again right now:
ratio_mm_failed 5.910
It's OK again
- Subject changed from Provide "fail-rate" alerting with ratio_mm_failed 5.280 to Provide "fail-rate" alerting with ratio_mm_failed 5.280 size:M
- Description updated (diff)
- Status changed from New to Workable
- Due date deleted (
2021-12-07)
- Start date deleted (
2021-07-28)
I think you copied the due-date from the clonee ticket hence removing it here and resetting start date as well. By the way, what do you want to say with the subject "Provide … alerting"?
- Subject changed from Provide "fail-rate" alerting with ratio_mm_failed 5.280 size:M to Ratio of multi-machine tests alerting with ratio_mm_failed 5.280 size:M
okurz wrote:
I think you copied the due-date from the clonee ticket hence removing it here and resetting start date as well. By the way, what do you want to say with the subject "Provide … alerting"?
Monitor "fail-ratio" of tests
became Provide "fail-rate" of tests
became this 😉️
- Priority changed from Normal to High
Alerting now:
ratio_mm_failed 5.080
I think it's safe to say we've reached alert fatigue. And we're not even clear what caused or resolved the previous occurences. Hence raising prio.
- Related to action #95783: Provide support for multi-machine scenarios handled by openqa-investigate size:M added
- Related to action #71809: Enable multi-machine jobs trigger without "isos post" added
- Status changed from Workable to Feedback
- Parent task set to #103962
- Status changed from Feedback to Resolved
Also available in: Atom
PDF