Project

General

Profile

coordination #102951

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

[epic] Better network performance monitoring

Added by okurz over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-11-24
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

Motivation

See #102882

Acceptance criteria

  • AC1: The up-/download rates are available to admins for investigation

Suggestions

Record the up-/download rates from cache service downloads and maybe put them into influxdb API routes or a log message OR Can we run the according iperf3 commands periodically in our monitoring? I guess just some seconds every hour should provide enough data and we can smooth in grafana: There is this open request with an simple exec example: https://github.com/influxdata/telegraf/issues/3866#issuecomment-694429507 - this should work for our use-case. We just need to make sure not to run all requests at the same time to all workers because it would quite easily saturate the whole link of OSD


Subtasks

action #102957: Better network performance monitoring - up-/download speed from cache service, e.g. in log file size:MResolvedkraih

action #106901: Expose bandwidth data for worker cache via influxdb size:MResolvedkraih

action #106904: Monitoring for worker specific bandwidth size:MResolvedkraih


Related issues

Related to openQA Project - action #110497: Minion influxdb data causing unusual download rates size:MResolved2022-05-01

Copied from openQA Infrastructure - coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache serviceResolved2022-02-10

History

#1 Updated by okurz over 1 year ago

  • Copied from coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache service added

#2 Updated by okurz over 1 year ago

  • Project changed from openQA Infrastructure to openQA Project
  • Category set to Feature requests

#3 Updated by okurz over 1 year ago

  • Tracker changed from action to coordination
  • Subject changed from Better network performance monitoring to [epic] Better network performance monitoring
  • Status changed from New to Blocked
  • Assignee set to okurz
  • Parent task set to #80142

#4 Updated by okurz over 1 year ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

one subtask resolved. Based on what we can come up with more subtasks

#5 Updated by cdywan over 1 year ago

okurz wrote:

one subtask resolved. Based on what we can come up with more subtasks

Let's talk about it in the Unblock tomorrow

#6 Updated by okurz over 1 year ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz

#7 Updated by okurz about 1 year ago

  • Status changed from Blocked to Resolved

All subtasks completed, AC1 fulfilled.

#8 Updated by okurz about 1 year ago

  • Related to action #110497: Minion influxdb data causing unusual download rates size:M added

Also available in: Atom PDF