Project

General

Profile

action #54803

monitoring for disk space does not show graph details of used space

Added by okurz about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
2019-07-29
Due date:
% Done:

0%

Estimated time:

Description

I received monitoring alerts for https://thruk.suse.de/thruk/cgi-bin/extinfo.cgi?type=2&host=openqa.suse.de&service=fs_%2Fvar%2Flib%2Fopenqa#pnp_th5/1534368689/1564435889/0 but the graph does not seem to show the actual usage since 2019-04

History

#1 Updated by okurz about 2 years ago

the link https://thruk.suse.de/thruk/cgi-bin/extinfo.cgi?type=2&host=openqa.suse.de&service=fs_%2Fvar%2Flib%2Fopenqa#pnp_th2/1564346019/1564436019/0 at the bottom of the page states echo "ERROR - you did an active check on this service - please disable active checks" && exit 1. I don't see how "active checks" could be disabled, they are reported as being disabled already. https://thruk.suse.de/thruk/cgi-bin/config.cgi?type=services&jump2=openqa.suse.de&jump3=fs_%2Fvar%2Flib%2Fopenqa tells me I am missing permissions. enginfra ticket?

#2 Updated by nicksinger about 2 years ago

#3 Updated by okurz about 2 years ago

Thanks, good workaround for the time that is covered there. I still think the plentiful more checks that nagios/check_mk/thruk provide us for now have some use. I asked in https://chat.suse.de/channel/infra-discuss if anyone knows. We could try to ping lrupp if there is no response.

#4 Updated by okurz about 2 years ago

  • Status changed from New to Feedback
  • Assignee set to okurz
Lars Vogdt @lrupp 3:22 PM okurz: jip. looks to me like the graph stuff does not work since a while now for any machine
Oliver Kurz @okurz 3:25 PM lrupp: I see. So it's a common problem, not just the specific query and not just the specific machine? Who could help best with this?
Lars Vogdt @lrupp 3:26 PM okurz: as I don't know which machine/service you are talking about, I can't be sure. But in general it's either a openSUSE heroes problem or EngInfra.
Oliver Kurz @okurz 3:52 PM I meant if you are aware of other checks against other machines sharing this problem. as in: the whole instance of thruk.suse.de and all supplying machines are affected 

so created https://infra.nue.suse.com/SelfService/Display.html?id=144874

#5 Updated by rwawrig about 2 years ago

The monitoring graphs stopped on Apr 11, which is a Thursday, which points to maintenance window - possible the updates broke something.
What I found is that rrdcached.service was going in a loop, failing every minute or so and restarting itself. I couldn't find what the issue is, no help in the logs. In the end I changed vendor from SUSE to obs and installed a newer version from our NPI repo. The graphs are getting data now.

About the "you did an active check on this service - please disable active checks" message, it seems all hosts that use passive instead active checks have that message, probably from when they were added for the first time. I think is safe to ignore it. As you can see now from the graphs, the client sends to monitoring server the data successfully , i.e. passive checks.

For the link where you're missing permissions, that's the Icinga config section for this host, that is managed by Infra and where I don't think you should have permissions. Please let me know if you thing otherwise (via the infra ticket).

Cheers
Robert

#6 Updated by okurz about 2 years ago

  • Status changed from Feedback to Resolved

Hi Robert, thanks a lot for the very detailed updated and also the hint for the "active/passive check" notes. Appreciated. As I can see that the monitoring graphs again show data I consider the original problem resolved. Also I don't see a problem that I do not have permissions to change anything in the icinga config. That's ok and expected. I can't find a way to comment on the ticket https://infra.nue.suse.com/SelfService/Display.html?id=144874 though.

Also available in: Atom PDF