Project

General

Profile

action #80734

GitLab pipeline trigger via Grafana fails due to TLS errors

Added by cdywan 11 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Target version:
Start date:
2020-12-04
Due date:
% Done:

0%

Estimated time:

Description

Observation

The salt-states-openqa pipeline is not always triggered even though Automatic actions on Grafana shows a host to be offline, as happened today with openqaworker-arm-1 (and which was meanwhile solved by triggering a reboot manually).

A look at the logfile on openqa-monitor.qa.suse.de via sudo cat /var/log/grafana/grafana.log | grep arm-1 | less reveals:

t=2020-12-04T10:20:29+0100 lvl=eror msg="Failed to send webhook" logger=alerting.notifier.webhook error="Post \"https://gitlab.suse.de/api/v4/projects/4652/trigger/pipeline?token=...&ref=master&variables[MACHINE]=openqaworker-arm-1\": net/http: TLS handshake timeout" webhook="Trigger reboot of openqaworker-arm-1"
t=2020-12-04T10:20:29+0100 lvl=eror msg="failed to send notification" logger=alerting.notifier uid=o5EYinpZk error="Post \"https://gitlab.suse.de/api/v4/projects/4652/trigger/pipeline?token=...&ref=master&variables[MACHINE]=openqaworker-arm-1\": net/http: TLS handshake timeout"
t=2020-12-04T10:20:29+0100 lvl=eror msg="failed to send notification" logger=alerting.notifier uid=o5EYinpZk error="Post \"https://gitlab.suse.de/api/v4/projects/4652/trigger/pipeline?token=...&ref=master&variables[MACHINE]=openqaworker-arm-1\": net/http: TLS handshake timeout"

So it looks like Grafana tried to post the API request, but failed due to TLS errors.

See also RT-ADM #181878

History

#1 Updated by cdywan 11 months ago

  • Description updated (diff)

#2 Updated by cdywan 11 months ago

  • Description updated (diff)

#3 Updated by okurz 10 months ago

  • Status changed from New to Blocked
  • Assignee set to cdywan
  • Target version set to Ready

cdywan assigned to you with "Blocked" as you can track the EngInfra ticket. Hint for next time: Please add "osd-admins@suse.de" or "o3-admins@suse.de" (depending on which infrastructure is affected) in CC for the ticket.

#4 Updated by cdywan 9 months ago

Just a small update, this seems to be a symptom of git-pack-objects: excessive cpu and mem usage which is to do with a lack of resource separation/ limits and work-arounds are being evaluated. Of course we might just be lucky since the relevant pipelines don't run very frequently - or rather, they run on an as-needed basis.

#5 Updated by okurz 8 months ago

  • Priority changed from Normal to Low

I think you agree that this seems to not pose that much of an impact so lowering to "Low".

#6 Updated by okurz 4 months ago

  • Status changed from Blocked to Resolved

https://infra.nue.suse.com/SelfService/Display.html?id=181878#txn-2748777 was resolved 2021-01-18 08:49 and I am not aware that we would have seen the problem since then

Also available in: Atom PDF