action #80734
closedGitLab pipeline trigger via Grafana fails due to TLS errors
0%
Description
Observation¶
The salt-states-openqa pipeline is not always triggered even though Automatic actions on Grafana shows a host to be offline, as happened today with openqaworker-arm-1
(and which was meanwhile solved by triggering a reboot manually).
A look at the logfile on openqa-monitor.qa.suse.de
via sudo cat /var/log/grafana/grafana.log | grep arm-1 | less
reveals:
t=2020-12-04T10:20:29+0100 lvl=eror msg="Failed to send webhook" logger=alerting.notifier.webhook error="Post \"https://gitlab.suse.de/api/v4/projects/4652/trigger/pipeline?token=...&ref=master&variables[MACHINE]=openqaworker-arm-1\": net/http: TLS handshake timeout" webhook="Trigger reboot of openqaworker-arm-1"
t=2020-12-04T10:20:29+0100 lvl=eror msg="failed to send notification" logger=alerting.notifier uid=o5EYinpZk error="Post \"https://gitlab.suse.de/api/v4/projects/4652/trigger/pipeline?token=...&ref=master&variables[MACHINE]=openqaworker-arm-1\": net/http: TLS handshake timeout"
t=2020-12-04T10:20:29+0100 lvl=eror msg="failed to send notification" logger=alerting.notifier uid=o5EYinpZk error="Post \"https://gitlab.suse.de/api/v4/projects/4652/trigger/pipeline?token=...&ref=master&variables[MACHINE]=openqaworker-arm-1\": net/http: TLS handshake timeout"
So it looks like Grafana tried to post the API request, but failed due to TLS errors.
See also RT-ADM #181878
Updated by okurz almost 4 years ago
- Status changed from New to Blocked
- Assignee set to livdywan
- Target version set to Ready
@cdywan assigned to you with "Blocked" as you can track the EngInfra ticket. Hint for next time: Please add "osd-admins@suse.de" or "o3-admins@suse.de" (depending on which infrastructure is affected) in CC for the ticket.
Updated by livdywan over 3 years ago
Just a small update, this seems to be a symptom of git-pack-objects: excessive cpu and mem usage which is to do with a lack of resource separation/ limits and work-arounds are being evaluated. Of course we might just be lucky since the relevant pipelines don't run very frequently - or rather, they run on an as-needed basis.
Updated by okurz over 3 years ago
- Priority changed from Normal to Low
I think you agree that this seems to not pose that much of an impact so lowering to "Low".
Updated by okurz over 3 years ago
- Status changed from Blocked to Resolved
https://infra.nue.suse.com/SelfService/Display.html?id=181878#txn-2748777 was resolved 2021-01-18 08:49 and I am not aware that we would have seen the problem since then