action #54803
closedmonitoring for disk space does not show graph details of used space
0%
Description
I received monitoring alerts for https://thruk.suse.de/thruk/cgi-bin/extinfo.cgi?type=2&host=openqa.suse.de&service=fs_%2Fvar%2Flib%2Fopenqa#pnp_th5/1534368689/1564435889/0 but the graph does not seem to show the actual usage since 2019-04
Updated by okurz over 5 years ago
the link https://thruk.suse.de/thruk/cgi-bin/extinfo.cgi?type=2&host=openqa.suse.de&service=fs_%2Fvar%2Flib%2Fopenqa#pnp_th2/1564346019/1564436019/0 at the bottom of the page states echo "ERROR - you did an active check on this service - please disable active checks" && exit 1
. I don't see how "active checks" could be disabled, they are reported as being disabled already. https://thruk.suse.de/thruk/cgi-bin/config.cgi?type=services&jump2=openqa.suse.de&jump3=fs_%2Fvar%2Flib%2Fopenqa tells me I am missing permissions. enginfra ticket?
Updated by nicksinger over 5 years ago
You can use the graph available at http://stats.openqa-monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview?orgId=1 if you like :)
Updated by okurz over 5 years ago
Thanks, good workaround for the time that is covered there. I still think the plentiful more checks that nagios/check_mk/thruk provide us for now have some use. I asked in https://chat.suse.de/channel/infra-discuss if anyone knows. We could try to ping lrupp if there is no response.
Updated by okurz over 5 years ago
- Status changed from New to Feedback
- Assignee set to okurz
Lars Vogdt @lrupp 3:22 PM okurz: jip. looks to me like the graph stuff does not work since a while now for any machine
Oliver Kurz @okurz 3:25 PM lrupp: I see. So it's a common problem, not just the specific query and not just the specific machine? Who could help best with this?
Lars Vogdt @lrupp 3:26 PM okurz: as I don't know which machine/service you are talking about, I can't be sure. But in general it's either a openSUSE heroes problem or EngInfra.
Oliver Kurz @okurz 3:52 PM I meant if you are aware of other checks against other machines sharing this problem. as in: the whole instance of thruk.suse.de and all supplying machines are affected
so created https://infra.nue.suse.com/SelfService/Display.html?id=144874
Updated by rwawrig over 5 years ago
The monitoring graphs stopped on Apr 11, which is a Thursday, which points to maintenance window - possible the updates broke something.
What I found is that rrdcached.service was going in a loop, failing every minute or so and restarting itself. I couldn't find what the issue is, no help in the logs. In the end I changed vendor from SUSE to obs and installed a newer version from our NPI repo. The graphs are getting data now.
About the "you did an active check on this service - please disable active checks" message, it seems all hosts that use passive instead active checks have that message, probably from when they were added for the first time. I think is safe to ignore it. As you can see now from the graphs, the client sends to monitoring server the data successfully , i.e. passive checks.
For the link where you're missing permissions, that's the Icinga config section for this host, that is managed by Infra and where I don't think you should have permissions. Please let me know if you thing otherwise (via the infra ticket).
Cheers
Robert
Updated by okurz over 5 years ago
- Status changed from Feedback to Resolved
Hi Robert, thanks a lot for the very detailed updated and also the hint for the "active/passive check" notes. Appreciated. As I can see that the monitoring graphs again show data I consider the original problem resolved. Also I don't see a problem that I do not have permissions to change anything in the icinga config. That's ok and expected. I can't find a way to comment on the ticket https://infra.nue.suse.com/SelfService/Display.html?id=144874 though.