Project

General

Profile

Actions

action #117172

closed

Flaky alert about infrastructure packet loss

Added by okurz about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2022-09-25
Due date:
% Done:

0%

Estimated time:

Description

See http://monitor.qa.suse.de/d/EML0bpuGk/monitoring?tab=alert&viewPanel=4&orgId=1 in particular about an alert on 25. September 2022 03:40:44 CEST which turned back to green just 1m later. We should such flaky alert reports


Related issues 1 (1 open0 closed)

Copied to openQA Infrastructure - action #118375: Do not alert about "packet loss" if hosts are downNew

Actions
Actions #1

Updated by okurz about 2 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz

mkittler prepared https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/745 to bump the alerting time period.

Actions #2

Updated by okurz about 2 years ago

  • Due date set to 2022-10-11
  • Status changed from In Progress to Feedback

I was thinking if we should not increase the alerting time threshold even more but maybe 15m is ok for now.

Actions #3

Updated by mkittler about 2 years ago

  • Assignee changed from okurz to mkittler

I've seen the additional alert from today but it was before my SR has been merged. I'd wait a few days to see whether it helped.

Actions #4

Updated by mkittler about 2 years ago

  • Status changed from Feedback to Resolved

It hasn't happened again so I'm resolving the ticket for now.

Actions #5

Updated by livdywan about 2 years ago

  • Status changed from Resolved to Feedback

The alert was just live for 3 minutes. Did we actually increase it to 15 minutes as per #117172#note-2 or is there another alert involved here? Something doesn't seem to work correctly here.

Actions #7

Updated by okurz about 2 years ago

  • Copied to action #118375: Do not alert about "packet loss" if hosts are down added
Actions #8

Updated by okurz about 2 years ago

  • Due date deleted (2022-10-11)
  • Status changed from Feedback to Resolved

MR merged, rolled out and effective. Further improvements put into #118375

Actions

Also available in: Atom PDF