Project

General

Profile

Actions

action #102957

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #102951: [epic] Better network performance monitoring

Better network performance monitoring - up-/download speed from cache service, e.g. in log file size:M

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-11-24
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See #102882

Acceptance criteria

  • AC1: The up-/download rates are available to admins for investigation in some place, e.g. in worker log files

Suggestions

  • Measure average over the whole download
  • Record the up-/download rates from cache service up-/downloads in a simple log message
  • Conduct a realistic test e.g. MB/s with one or two decimals in a production test

Alternatives if you consider this easier:

  • Put the measurement into influxdb API routes (and a Grafana dashboard)
  • Run the according iperf3 commands periodically in our monitoring? I guess just some seconds every hour should provide enough data and we can smooth in grafana: There is this open request with an simple exec example: https://github.com/influxdata/telegraf/issues/3866#issuecomment-694429507 - this should work for our use-case. We just need to make sure not to run all requests at the same time to all workers because it would quite easily saturate the whole link of OSD. Besides, the iperf3 service only supports one client at a time.
Actions

Also available in: Atom PDF