Project

General

Profile

Actions

coordination #102951

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

[epic] Better network performance monitoring

Added by okurz over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-11-24
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Motivation

See #102882

Acceptance criteria

  • AC1: The up-/download rates are available to admins for investigation

Suggestions

Record the up-/download rates from cache service downloads and maybe put them into influxdb API routes or a log message OR Can we run the according iperf3 commands periodically in our monitoring? I guess just some seconds every hour should provide enough data and we can smooth in grafana: There is this open request with an simple exec example: https://github.com/influxdata/telegraf/issues/3866#issuecomment-694429507 - this should work for our use-case. We just need to make sure not to run all requests at the same time to all workers because it would quite easily saturate the whole link of OSD


Subtasks 3 (0 open3 closed)

action #102957: Better network performance monitoring - up-/download speed from cache service, e.g. in log file size:MResolvedkraih2021-11-24

Actions
action #106901: Expose bandwidth data for worker cache via influxdb size:MResolvedkraih2022-02-16

Actions
action #106904: Monitoring for worker specific bandwidth size:MResolvedkraih2022-02-16

Actions

Related issues 2 (0 open2 closed)

Related to openQA Project - action #110497: Minion influxdb data causing unusual download rates size:MResolvedmkittler2022-05-01

Actions
Copied from openQA Infrastructure - coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache serviceResolvedkraih2022-02-10

Actions
Actions

Also available in: Atom PDF