Project

General

Profile

Actions

coordination #102951

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

[epic] Better network performance monitoring

Added by okurz about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-11-24
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Motivation

See #102882

Acceptance criteria

  • AC1: The up-/download rates are available to admins for investigation

Suggestions

Record the up-/download rates from cache service downloads and maybe put them into influxdb API routes or a log message OR Can we run the according iperf3 commands periodically in our monitoring? I guess just some seconds every hour should provide enough data and we can smooth in grafana: There is this open request with an simple exec example: https://github.com/influxdata/telegraf/issues/3866#issuecomment-694429507 - this should work for our use-case. We just need to make sure not to run all requests at the same time to all workers because it would quite easily saturate the whole link of OSD


Subtasks 3 (0 open3 closed)

action #102957: Better network performance monitoring - up-/download speed from cache service, e.g. in log file size:MResolvedkraih2021-11-24

Actions
action #106901: Expose bandwidth data for worker cache via influxdb size:MResolvedkraih2022-02-16

Actions
action #106904: Monitoring for worker specific bandwidth size:MResolvedkraih2022-02-16

Actions

Related issues 2 (0 open2 closed)

Related to openQA Project - action #110497: Minion influxdb data causing unusual download rates size:MResolvedmkittler2022-05-01

Actions
Copied from openQA Infrastructure - coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache serviceResolvedkraih2022-02-10

Actions
Actions #1

Updated by okurz about 2 years ago

  • Copied from coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache service added
Actions #2

Updated by okurz about 2 years ago

  • Project changed from openQA Infrastructure to openQA Project
  • Category set to Feature requests
Actions #3

Updated by okurz about 2 years ago

  • Tracker changed from action to coordination
  • Subject changed from Better network performance monitoring to [epic] Better network performance monitoring
  • Status changed from New to Blocked
  • Assignee set to okurz
  • Parent task set to #80142
Actions #4

Updated by okurz about 2 years ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

one subtask resolved. Based on what we can come up with more subtasks

Actions #5

Updated by livdywan about 2 years ago

okurz wrote:

one subtask resolved. Based on what we can come up with more subtasks

Let's talk about it in the Unblock tomorrow

Actions #6

Updated by okurz about 2 years ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz
Actions #7

Updated by okurz almost 2 years ago

  • Status changed from Blocked to Resolved

All subtasks completed, AC1 fulfilled.

Actions #8

Updated by okurz almost 2 years ago

  • Related to action #110497: Minion influxdb data causing unusual download rates size:M added
Actions

Also available in: Atom PDF