Project

General

Profile

Actions

action #106904

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #102951: [epic] Better network performance monitoring

Monitoring for worker specific bandwidth size:M

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-02-16
Due date:
% Done:

0%

Estimated time:

Description

User story

As an openQA instance administrator I am alerted on unusual up-/download rates to/from individual workers to identify problems in our infrastructure

Acceptance criteria

  • AC1: monitor.qa has alerts for each worker depending on cache service bandwidth

Suggestions

Actions #1

Updated by okurz over 2 years ago

  • Priority changed from Normal to Low
Actions #2

Updated by kraih over 2 years ago

  • Assignee set to kraih

I'd like to learn a bit more about Grafana.

Actions #3

Updated by kraih over 2 years ago

  • Status changed from Workable to In Progress
Actions #6

Updated by kraih over 2 years ago

I've looked through the data we've already collected in the last few days, and for most workers the low point is around 19 MiB/s so far. Just openqaworker2 has dipped as low as 9.52 MiB/s at 2022-03-01 12:35:00. For an alert a value between 1 and 5 MiB/s (over an hour) might make sense.

Actions #7

Updated by okurz over 2 years ago

As I suggested over chat: Use the max value and alert if it's below a threshold. We don't care about single dips of low transfer, we care about limited bandwidth affecting all download.

Actions #9

Updated by kraih over 2 years ago

Updated PR with requested changes.

Actions #10

Updated by kraih over 2 years ago

  • Status changed from In Progress to Feedback
Actions #11

Updated by kraih over 2 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF