action #40196
closed[monitoring] monitor internal port 9526, port 80, external port 443 accessibility of o3 and response times size:M
0%
Description
Motivation¶
Outcome of #39743#note-20 , would be nice to have better monitoring
Acceptance criteria¶
- AC1: There is active monitoring for at least 443 on o3
Suggestions¶
- The current monitoring solution applicable to o3 is https://zabbix.nue.suse.com. Ensure that in there at least port 443 reachable from outside is covered
- As applicable extend to monitor for port 80 and/or 9526 reachable by localhost
Updated by okurz over 6 years ago
- Related to action #39743: [o3][tools] o3 unusable, often responds with 504 Gateway Time-out added
Updated by coolo about 6 years ago
- Project changed from openQA Project (public) to openQA Infrastructure (public)
- Category deleted (
168)
Updated by nicksinger about 6 years ago
- Status changed from New to Blocked
- Assignee set to okurz
Two questions which need to be clarified first:
- which should host should do these checks
- where should we store/save the metrics
I know that there is already some kind of grafana+influxdb for opensuse projects. But do they also run checks for other hosts? If so, what do they use to collect metrics? Telegraf?
@okurz: since you know many opensuse-people I'd kindly ask you to clarify these two points. I can then help with setting up the rest.
Updated by okurz about 6 years ago
- Status changed from Blocked to Feedback
nicksinger wrote:
@okurz: since you know many opensuse-people I'd kindly ask you to clarify these two points. I can then help with setting up the rest.
yes, but I do not have a better way then discussing on #opensuse-admin on freenode so I suggest you try that. You can ping me there as well :)
Also, I think you meant to set the ticket to "Blocked" because you are "waiting" for me, right? I do not know of any other ticket reference, "Feedback" therefore :)
Updated by okurz almost 6 years ago
- Assignee changed from okurz to nicksinger
@nicksinger back to you
Updated by okurz almost 5 years ago
- Status changed from Feedback to New
ok so back to "New" for the tasks to clarify which grafana instance to use for openqa.opensuse.org or where to run an instance.
Updated by okurz over 4 years ago
- Subject changed from [tools][monitoring] monitor internal port 9526, port 80, external port 443 accessibility of o3 and response times to [monitoring] monitor internal port 9526, port 80, external port 443 accessibility of o3 and response times
- Target version set to future
Updated by okurz 10 months ago
- Subject changed from [monitoring] monitor internal port 9526, port 80, external port 443 accessibility of o3 and response times to [monitoring] monitor internal port 9526, port 80, external port 443 accessibility of o3 and response times size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by jbaier_cz 10 months ago
- Blocked by action #156322: zabbix-proxy.dmz-prg2.suse.org not reachable from ariel.suse-dmz.opensuse.org added
Updated by okurz 14 days ago ยท Edited
- Status changed from Workable to Feedback
- Target version changed from Tools - Next to Ready
- Parent task set to #162146
With #156322 we have proper monitoring again for o3 but what about meta-monitoring to ensure that our monitoring is in place? I don't think we need to monitor the IT maintained zabbix instance but we can add secondary monitoring. I added monitoring with https://dashboard.uptimerobot.com/monitors, credentials in
https://gitlab.suse.de/openqa/password/-/merge_requests/22
But also I found in https://zabbix.nue.suse.com/zabbix.php?show=1&name=&inventory%5B0%5D%5Bfield%5D=type&inventory%5B0%5D%5Bvalue%5D=&evaltype=0&tags%5B0%5D%5Btag%5D=&tags%5B0%5D%5Boperator%5D=0&tags%5B0%5D%5Bvalue%5D=&show_tags=3&tag_name_format=0&tag_priority=&show_opdata=0&show_timeline=1&filter_name=&filter_show_counter=0&filter_custom_time=0&sort=clock&sortorder=DESC&age_state=0&show_suppressed=0&unacknowledged=0&compact_view=0&details=0&highlight_row=0&action=problem.view that there are two problems stated which I have not seen an email about.
Created two new tickets:
Updated by okurz 14 days ago
- Related to action #174316: [o3][zabbix][alert] warning about depleting storage space but no email? size:S added
Updated by okurz 14 days ago
- Related to action #174313: [o3][zabbix][alert] / and /var/tmp: "Disk space is low and might be full in 7d (used > 85%)" since 2024-12-11 06:50 size:S added
Updated by okurz 14 days ago
- Status changed from Feedback to Resolved
I merged https://gitlab.suse.de/openqa/password/-/merge_requests/22 myself now.