Project

General

Profile

Actions

action #12912

closed

[tools]monitoring of o3/osd

Added by okurz almost 8 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2016-07-28
Due date:
% Done:

0%

Estimated time:

Description

disk space depleted, coolo cleaned up. Why no monitoring? asked on #infra --> https://nagios-devel.suse.de/icinga/cgi-bin/status.cgi?host=openqa.suse.de which needs special permissions, but
https://nagios-devel.suse.de/pnp4nagios/index.php/graph?host=openqa.suse.de&srv=fs_/var/lib/openqa&view=4 works; asked again on #infra: [16/06/2016 10:23:34] hello, who can help with monitoring of openqa.suse.de? It is monitored by nagios but it seems there is no notifications on email or similar. Can this be enabled? can the infra-bot also tell on that? or can we have a nagios notification in another irc channel?

Am 2016-07-23 20:20, okurz schrieb:

=> no output from the bot in the qa-review channel. I guess we need
to coordinate what should be in the context sent to the bot. Can you
sent me some examples output?

Well, the bot does not yet support any of these features. So far I
just started a listening netcat myself. Is this IRC notification custom
built or part of nagios?

Nagios will just sent anything you like to anything you can configure

The bot we use at #infra is a Supybot.
https://www.dragonsreach.it/2012/06/30/nagios-irc-notifications/

So who can assume to see what notifications under which circumstances?
I would like to inform others but I don't know on what, yet.

service_notification_options w,u,c,r
host_notification_options d,r

Any service that is:

  • warning
  • unknown
  • critical
  • recovered and any host that is:
  • down
  • recovered will trigger a notification.

"common admins" should be subscribed by email for the more important notifications but Lars did not further answer my question. I can take a look into IRC notifications some time.

For o3 the ones that should receive a monitoring alert are: lnussel, rbrown, coolo, okurz, dleuenberger, mlin

further details

literature


Related issues 3 (0 open3 closed)

Related to openQA Tests - action #16512: timeout while uploading the logs - test fails in install_and_rebootResolvedokurz2017-02-06

Actions
Related to openQA Tests - action #17548: osd out of spaceResolvedokurz2017-03-06

Actions
Related to openQA Infrastructure - action #18164: [devops][tools] monitoring of openqa worker instancesResolvednicksinger2018-04-25

Actions
Actions

Also available in: Atom PDF