Project

General

Profile

Actions

action #102957

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #102951: [epic] Better network performance monitoring

Better network performance monitoring - up-/download speed from cache service, e.g. in log file size:M

Added by okurz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-11-24
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See #102882

Acceptance criteria

  • AC1: The up-/download rates are available to admins for investigation in some place, e.g. in worker log files

Suggestions

  • Measure average over the whole download
  • Record the up-/download rates from cache service up-/downloads in a simple log message
  • Conduct a realistic test e.g. MB/s with one or two decimals in a production test

Alternatives if you consider this easier:

  • Put the measurement into influxdb API routes (and a Grafana dashboard)
  • Run the according iperf3 commands periodically in our monitoring? I guess just some seconds every hour should provide enough data and we can smooth in grafana: There is this open request with an simple exec example: https://github.com/influxdata/telegraf/issues/3866#issuecomment-694429507 - this should work for our use-case. We just need to make sure not to run all requests at the same time to all workers because it would quite easily saturate the whole link of OSD. Besides, the iperf3 service only supports one client at a time.
Actions #1

Updated by livdywan almost 3 years ago

  • Subject changed from Better network performance monitoring - up-/download speed from cache service, e.g. in log file to Better network performance monitoring - up-/download speed from cache service, e.g. in log file size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #2

Updated by kraih almost 3 years ago

  • Assignee set to kraih
Actions #3

Updated by kraih almost 3 years ago

  • Status changed from Workable to In Progress
Actions #4

Updated by kraih almost 3 years ago

Opened a PR that logs the overall download speed after it was successful. https://github.com/os-autoinst/openQA/pull/4385

Actions #5

Updated by kraih almost 3 years ago

  • Status changed from In Progress to Feedback
Actions #6

Updated by kraih almost 3 years ago

  • Status changed from Feedback to Resolved

Samples from O3 in production:

[info] [#2771]
Download of "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Kubic-DVD-x86_64-Snapshot20211207-Media.iso" successful (75 MiB/s), new cache size is 48 GiB
[info] [#2770]
Download of "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Kubic-DVD-x86_64-Snapshot20211207-Media.iso.sha256" successful (2188.79 Byte/s), new cache size is 48 GiB
[info] [#1079]
Download of "/var/lib/openqa/cache/openqa1-opensuse/centOS-8.2.2004-x86_64-minimal.qcow2" successful (27 MiB/s), new cache size is 197 GiB
[info] [#494895]
Download of "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Leap-15.3-WSL.x86_64-153.1.110.0-Build1.110.appx" successful (22 MiB/s), new cache size is 47 GiB
[info] [#494896]
Download of "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Leap-15.3-WSL.x86_64-153.1.110.0-Build1.110.appx.sha256" successful (1669.22 Byte/s), new cache size is 47 GiB
[info] [#494897]
Download of "/var/lib/openqa/cache/openqa1-opensuse/windows-10-x86_64-21H2@64bit_win.qcow2" successful (43 MiB/s), new cache size is 48 GiB
Actions

Also available in: Atom PDF