Project

General

Profile

Actions

action #62048

closed

monitor incompletes

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2020-01-13
Due date:
2020-01-22
% Done:

0%

Estimated time:

Description

Last Wednesday after deployment we had (as multiple times in the past) many incompletes and we did not learn about this until our users told us. I think our "time to recovery" was good. But the "time to detection" is something I would like to improve. I am thinking of monitor+alert of incompletes, more specifically "incompletion rate" which is a little more complex. we discussed having a mojo command that runs constantly to spit out monitoring data we're intested in. You can deploy this by salt, so it stays in sync with our telegraf config. But also that one you can run in polling fashion by exec. It is to be seen how expensive the startup and db connect is if done every 5s but possibly you don't need it every 5s to be useful.

Actions

Also available in: Atom PDF