Project

General

Profile

Actions

action #94399

open

No alert when arm workers are offline, alert if telegraf throws errors size:M

Added by Xiaojing_liu almost 3 years ago. Updated about 1 year ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2021-06-22
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

On 2021-06-22, all arm workers (arm-1, arm-2, arm-3) couldn't be connected by using ssh or ping.
But https://stats.openqa-monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview?orgId=1 showed that all of them were Online.

Acceptance criteria

Suggestions

  1. We should look into feeding something into influxdb when the telegraf service especially on OSD shows errors or log error monitoring
  2. Than one could add a dashboard/graph with an alert within Grafana using the data from 1..

Files

Screenshot_20210622_102648.png (322 KB) Screenshot_20210622_102648.png Xiaojing_liu, 2021-06-22 03:30

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #94438: OSD deployment fails at 2021-06-21 because ' openqaworker (arm-3 and arm-2) Minion did not return'Resolvedokurz2021-06-22

Actions
Actions

Also available in: Atom PDF