Project

General

Profile

Actions

action #112736

closed

coordination #109846: [epic] Ensure all our database tables accomodate enough data, e.g. bigint for ids

Better alert based on 2022-06-18 incident size:M

Added by okurz almost 2 years ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-06-20
Due date:
% Done:

0%

Estimated time:

Description

Motivation

On 2022-06-18 we had issue #112718 with OSD showing significantly downgraded performance.
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&editPanel=84&from=1655503200000&to=1655675999000&tab=alert and https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1655516807123&to=1655574941005&viewPanel=89 showed the significant increase in number of rows per postgresql queries and increased apache time but no specific alert was raised. We could benefit from having a specific alert for that.

Acceptance criteria

  • AC1: A provisioned alert exists for database row returned

Suggestions

  • We have a monitoring panel for database rows returned, create a sensible alert for it
  • Consider sporadic spikes which we should not alert on, e.g. grace period "every 1m for 10m" or something
  • Add the alert to the provisioning data. If you have questions how to properly provision an alert from salt ask mkittler or nsinger (the README of the salt states repo has also been updated recently)

Out of scope

  • Replicating the database content for people to play with

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - coordination #112718: [alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over the weekendResolvedokurz2022-06-22

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Related to coordination #112718: [alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over the weekend added
Actions #2

Updated by okurz almost 2 years ago

  • Parent task set to #109846
Actions #3

Updated by okurz almost 2 years ago

  • Target version changed from Ready to future
Actions #4

Updated by okurz 12 months ago

  • Target version changed from future to Ready

I think we have a panel for rows but maybe no alert yet? Should be crosschecked

Actions #5

Updated by mkittler 11 months ago

  • Subject changed from Better alert based on 2022-06-18 incident to Better alert based on 2022-06-18 incident size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by mkittler 11 months ago

  • Assignee set to mkittler
Actions #7

Updated by mkittler 11 months ago

  • Status changed from Workable to Feedback
Actions

Also available in: Atom PDF