action #153418
closedhttp based health check against proxy.scc.suse.de size:M
0%
Description
Motivation¶
See #152857-18
Acceptance criteria¶
- AC1: There is an alert about proxy SCC http responsiveness
Suggestions¶
- Implement it similarly to existing HTTP-based checks
- Allow defining a generic list in pillars like we have for hosts that we want to ping, see https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls#L14
- In monitoring/telegraf/telegraf-worker.conf copy the jinja for loop for "required_external_networks" to the equivalent for http_response checks
- Add the according data in telegraf, have that merged, wait for data to appear, then crosscheck in grafana for the data
- Add an according panel like next to the "packet loss" thingy, see https://monitor.qa.suse.de/d/EML0bpuGk/monitoring?orgId=1&editPanel=4
- Add an according alert
Updated by okurz 10 months ago
- Copied from action #152857: [tools] alert ping between hosts timeout proxy.scc.suse.de added
Updated by nicksinger 10 months ago
- Status changed from Workable to In Progress
- Assignee set to nicksinger
Updated by openqa_review 10 months ago
- Due date set to 2024-01-30
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 10 months ago
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1088 to add monitoring panels, also done.
- I suggest to change
$tag_server - $tag_host
to$tag_host - $tag_server
to be consistent with the ping check where we also use<source> - <target>
- Please add an according alert
Updated by okurz 10 months ago
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1090 merged. Now alert plz! :)
Updated by nicksinger 10 months ago
- Status changed from In Progress to Feedback
okurz wrote in #note-9:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1090 merged. Now alert plz! :)
MR for alert: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1094
Updated by okurz 10 months ago
- Status changed from Feedback to In Progress
MR merged. https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2186107 shows that the deployment is done. I don't know why you chose to create an alert not linked to the dashboard so I don't know what's expected to be different. I can see an alert "Salt
http_response_codes" on https://monitor.qa.suse.de/alerting/list?search=http but it does not say "provisioned". Can you crosscheck if the alert was actually deployed or if that is still the one that you created manually to prepare it.
EDIT: Nope, the problem is likely that your newly mentioned file needs to be mentioned in monitoring/grafana.sls
Updated by nicksinger 10 months ago
- Status changed from In Progress to Resolved
Good catch, thank you. I fixed that with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1096 - it now shows up as "Provisioned" in grafana. As described in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1094 I couldn't find a way to link that alert to an panel and for the sake of "getting it done" I just didn't do it no not waste more time :)