action #127985
closed[alert][grafana] Handle various problems with slow monitoring https requests and such (was: DatasourceNoData HTTP Response alert) size:M
Added by okurz over 1 year ago. Updated over 1 year ago.
0%
Description
Observation¶
http://stats.openqa-monitor.qa.suse.de/alerting/grafana/tm0h5mf4k/view?orgId=1
Suggestions¶
- Follow #127985#note-13 and try to prevent too strict timeouts for various http monitoring requests triggered by telegraf
Updated by okurz over 1 year ago
- Copied from action #127982: [alert][grafana] DatasourceNoData e.g. openQA QA-Power8-5-kvm+openqaw5-xen Disk I/O time alert worker added
Updated by mkittler over 1 year ago
- Status changed from New to In Progress
The logs are generally dominated by messages from velociraptor.
Around the time this happened it also logs loss of sequences (not sure whether it is relevant, though):
Apr 19 16:19:50 openqa openqa-websockets-daemon[16738]: [debug] [pid:16738] Updating seen of worker 1348 from worker_status (free)
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":839,\"MaxSize\":1073741874,\"AvailableBytes\":357,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":1004,\"MaxSize\":1073741874,\"AvailableBytes\":514,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":1169,\"MaxSize\":1073741874,\"AvailableBytes\":671,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":1367,\"MaxSize\":1073741874,\"AvailableBytes\":861,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":1533,\"MaxSize\":1073741874,\"AvailableBytes\":1019,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":1698,\"MaxSize\":1073741874,\"AvailableBytes\":1176,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 4294967294 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 4294967294 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 4294967294 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 4294967294 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 4294967294 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 4294967294 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":1873,\"MaxSize\":1073741874,\"AvailableBytes\":1343,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":2048,\"MaxSize\":1073741874,\"AvailableBytes\":1510,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":2223,\"MaxSize\":1073741874,\"AvailableBytes\":1677,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":2430,\"MaxSize\":1073741874,\"AvailableBytes\":1876,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":2606,\"MaxSize\":1073741874,\"AvailableBytes\":2044,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":2781,\"MaxSize\":1073741874,\"AvailableBytes\":2211,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":2946,\"MaxSize\":1073741874,\"AvailableBytes\":2368,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":3111,\"MaxSize\":1073741874,\"AvailableBytes\":2525,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 Detected the loss of 1 sequences.
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":3276,\"MaxSize\":1073741874,\"AvailableBytes\":2682,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":3474,\"MaxSize\":1073741874,\"AvailableBytes\":2872,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":3640,\"MaxSize\":1073741874,\"AvailableBytes\":3030,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":3805,\"MaxSize\":1073741874,\"AvailableBytes\":3187,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 read_file: /proc/3101/cmdline: lstat /proc/3101: no such file or directory
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":4013,\"MaxSize\":1073741874,\"AvailableBytes\":3387,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 read_file: /proc/3118/cmdline: lstat /proc/3118: no such file or directory
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":4221,\"MaxSize\":1073741874,\"AvailableBytes\":3587,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 read_file: /proc/3118/cmdline: lstat /proc/3118: no such file or directory
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":4429,\"MaxSize\":1073741874,\"AvailableBytes\":3787,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 read_file: /proc/3122/cmdline: lstat /proc/3122: no such file or directory
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":4637,\"MaxSize\":1073741874,\"AvailableBytes\":3987,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 read_file: /proc/3126/cmdline: lstat /proc/3126: no such file or directory
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":4845,\"MaxSize\":1073741874,\"AvailableBytes\":4187,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 read_file: /proc/3126/cmdline: lstat /proc/3126: no such file or directory
Apr 19 16:19:50 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:50+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":5053,\"MaxSize\":1073741874,\"AvailableBytes\":4387,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 read_file: /proc/3122/cmdline: lstat /proc/3122: no such file or directory
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":5261,\"MaxSize\":1073741874,\"AvailableBytes\":4587,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 read_file: /proc/3078/cmdline: lstat /proc/3078: no such file or directory
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":5469,\"MaxSize\":1073741874,\"AvailableBytes\":4787,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 read_file: /proc/3131/cmdline: lstat /proc/3131: no such file or directory
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":5677,\"MaxSize\":1073741874,\"AvailableBytes\":4987,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 read_file: /proc/3131/cmdline: lstat /proc/3131: no such file or directory
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":5885,\"MaxSize\":1073741874,\"AvailableBytes\":5187,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 read_file: /proc/3083/cmdline: lstat /proc/3083: no such file or directory
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 File Ring Buffer: Enqueue {"header":"{\"ReadPointer\":466,\"WritePointer\":6093,\"MaxSize\":1073741874,\"AvailableBytes\":5387,\"LeasedBytes\":0}","leased_pointer":466}
Apr 19 16:19:51 openqa openqa-websockets-daemon[16738]: [debug] [pid:16738] Updating seen of worker 2230 from worker_status (working)
Apr 19 16:19:51 openqa openqa-websockets-daemon[16738]: [debug] [pid:16738] Updating seen of worker 2197 from worker_status (working)
Apr 19 16:19:51 openqa openqa-websockets-daemon[16738]: [debug] [pid:16738] Updating seen of worker 1247 from worker_status (working)
Apr 19 16:19:51 openqa openqa-websockets-daemon[16738]: [debug] [pid:16738] Updating seen of worker 2282 from worker_status (working)
Apr 19 16:19:51 openqa openqa-websockets-daemon[16738]: [debug] [pid:16738] Updating seen of worker 2209 from worker_status (working)
Apr 19 16:19:51 openqa openqa-websockets-daemon[16738]: [debug] [pid:16738] Updating seen of worker 2288 from worker_status (working)
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 Sender: Connected to https://sec-velociraptor.prg.suse.com:8000/control
Apr 19 16:19:51 openqa velociraptor[718]: [DEBUG] 2023-04-19T16:19:51+02:00 Connection Info {"IdleTime":1001856570,"LocalAddr":{"IP":"149.44.176.58","Port":56958,"Zone":""},"Reused":true,"WasIdle":true}
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 Sender: sent 1747 bytes, response with status: 200 OK
Apr 19 16:19:51 openqa velociraptor[718]: [INFO] 2023-04-19T16:19:51+02:00 Sender: received 626 bytes
While this is technically a no-data alert it looks like just before and right after we had data the response time was very long. Also similar statistics indicate that response times were simply quite long.
So this reminds me of #126212. In case of #126212 we quickly restarted the web UI (and possibly other services) and the problem ceased before the alert was triggered. Here it took longer so the alert fired. The long response times normalized themselves (without us having to restart anything).
Note that I've very recently updated the kernel version for #116722. However, I don't think we were able to reproduce this problem again because this problem was much more severe.
Updated by openqa_review over 1 year ago
- Due date set to 2023-05-04
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 1 year ago
Yes, it apparently happened again: https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=78&orgId=1&from=1681977074056&to=1681981528038
At the same time the response time went up and the CPU load down - same as before.
Updated by mkittler over 1 year ago
Most of the traffic comes from our workers, e.g. the top-ten host is also exclusively containing workers:
# bsdcat /var/log/apache2/openqa.access_log.1.xz | cut -d "-" -f 1 | sort | uniq -c | sort -g | tail -n 10
73924 10.160.0.245
75752 10.100.96.72
77262 2a07:de40:a203:12:ec4:7aff:fe7a:789e
79152 10.100.96.76
119733 2a07:de40:a203:12:56ab:3aff:fe16:dd73
139306 2620:113:80c0:80a0:10:162:2a:42f0
165088 2a07:de40:a203:12:56ab:3aff:fe24:358d
169252 10.160.0.85
199414 10.160.0.227
592536 2a07:de40:a203:12:2e60:cff:fe73:2ac
That's actually expected. The only exception is:
# host 2620:113:80c0:80a0:10:162:2a:42f0
Host 0.f.2.4.a.2.0.0.2.6.1.0.0.1.0.0.0.a.0.8.0.c.0.8.3.1.1.0.0.2.6.2.ip6.arpa not found: 3(NXDOMAIN)
Of course this is the statistic for the full logs from 20/Apr/2023:03:00:07 to 20/Apr/2023:13:00:03.
Updated by mkittler over 1 year ago
We also see timeouts in the logs:
Apr 20 10:23:20 openqa telegraf[1270]: 2023-04-20T08:23:20Z E! [inputs.http] Error in plugin: [url=https://openqa.suse.de/admin/influxdb/minion]: Get "https://openqa.suse.de/admin/influxdb/minion": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 20 10:23:20 openqa telegraf[1270]: 2023-04-20T08:23:20Z E! [inputs.http] Error in plugin: [url=https://openqa.suse.de/admin/influxdb/jobs]: Get "https://openqa.suse.de/admin/influxdb/jobs": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
There are also many PostgreSQL-related messages around that time:
…
Apr 20 10:23:18 openqa su[25344]: (to postgres) root on none
Apr 20 10:23:18 openqa systemd[1]: Started Session c8735 of User postgres.
Apr 20 10:23:18 openqa su[25344]: pam_unix(su-l:session): session opened for user postgres by (uid=0)
Apr 20 10:23:18 openqa su[25344]: pam_unix(su-l:session): session closed for user postgres
Apr 20 10:23:19 openqa systemd[1]: session-c8735.scope: Deactivated successfully.
Apr 20 10:23:19 openqa su[25369]: (to postgres) root on none
Apr 20 10:23:19 openqa systemd[1]: Started Session c8736 of User postgres.
Apr 20 10:23:19 openqa su[25369]: pam_unix(su-l:session): session opened for user postgres by (uid=0)
Apr 20 10:23:19 openqa su[25369]: pam_unix(su-l:session): session closed for user postgres
Apr 20 10:23:19 openqa systemd[1]: session-c8736.scope: Deactivated successfully.
Apr 20 10:23:19 openqa su[25394]: (to postgres) root on none
Apr 20 10:23:19 openqa systemd[1]: Started Session c8737 of User postgres.
Apr 20 10:23:19 openqa su[25394]: pam_unix(su-l:session): session opened for user postgres by (uid=0)
Apr 20 10:23:19 openqa su[25394]: pam_unix(su-l:session): session closed for user postgres
Apr 20 10:23:19 openqa systemd[1]: session-c8737.scope: Deactivated successfully.
Apr 20 10:23:19 openqa su[25420]: (to postgres) root on none
Apr 20 10:23:19 openqa systemd[1]: Started Session c8738 of User postgres.
Apr 20 10:23:19 openqa su[25420]: pam_unix(su-l:session): session opened for user postgres by (uid=0)
Apr 20 10:23:19 openqa openqa-websockets-daemon[19716]: [debug] [pid:19716] Updating seen of worker 479 from worker_status (free)
Apr 20 10:23:19 openqa su[25420]: pam_unix(su-l:session): session closed for user postgres
Apr 20 10:23:19 openqa systemd[1]: session-c8738.scope: Deactivated successfully.
Apr 20 10:23:19 openqa su[25445]: (to postgres) root on none
Apr 20 10:23:19 openqa systemd[1]: Started Session c8739 of User postgres.
Apr 20 10:23:19 openqa su[25445]: pam_unix(su-l:session): session opened for user postgres by (uid=0)
…
velociraptor
is also quite verbose; when scrolling through the logs there are sometimes pages with just messages from it. However, that's "normal" and thus likely not the culprit.
The PostgreSQL log shows many permission denied
errors around that time. So apparently some service tried to read data from the database without proper permissions. This always happens once per hour (always starts at minute 27), so also "normal" and not the culprit.
Not only the CPU usage but also the network usage (on eth0) went down during the time frame. The disk I/O usage was pretty normal overall. So this leaves one really wondering what was slowing down the system. Next time it happens I need to monitor more closely with htop.
Updated by okurz over 1 year ago
Sounds to me very much alike the problems we already had with apache connections clogging up until effective deadlocks. Most important as always is to reproduce the issue even it's just waiting for this one time during a week at 1145 or something. Then we could try out things like restart apache, increase number of processes, connections. Or replace with nginx
Updated by mkittler over 1 year ago
Note that the data comes from (which runs on the web UI itself):
[[inputs.http_response]]
urls = [ "https://{{grains['fqdn']}}/tests" ]
interval = "1m"
Since we've got a NoDataAlert one would assume that message like
Apr 20 10:25:20 openqa telegraf[1270]: 2023-04-20T08:25:20Z E! [inputs.http] Error in plugin: [url=https://openqa.suse.de/admin/influxdb/jobs]: Get "https://openqa.suse.de/admin/influxdb/jobs": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
show up in the journal around that time. However, they don't show up at all (only for /admin/influxdb/…
routes). So there's a lack of data but telegraf doesn't complain about it as it does for other queries. Supposedly that is because the other queries explicitly specify a timeout:
[[inputs.http]]
urls = [ "https://{{grains['fqdn']}}/admin/influxdb/jobs" ]
data_format = "influx"
timeout = "20s"
interval = "30s"
[[inputs.http]]
urls = [ "https://{{grains['fqdn']}}/admin/influxdb/minion" ]
data_format = "influx"
timeout = "20s"
interval = "1m"
Maybe we should do that here as well. Of course it wouldn't help with the underlying issue but maybe still help to prevent the no-data situation. At least there wasn't missing data during that timeframe for the /admin/influxdb/…
routes (despite context deadline exceeded
being logged a few times).
Updated by okurz over 1 year ago
- Related to action #127052: [alert] Apache Response Time alert followed by DatasourceNoData for min_apache_response size:M added
Updated by okurz over 1 year ago
please be aware about my MR https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/844 to handle NoData gracefully also affecting the alert mentioned here.
Updated by mkittler over 1 year ago
- Description updated (diff)
Yes, this means we can focus on the actual problem here.
Updated by mkittler over 1 year ago
Let's see what @kraih #127052 can find out.
The timeout specified in inputs.http
is basically ending up in the Client struct. If no timeout is specified the default is 5 seconds (see https://github.com/influxdata/telegraf/blob/35d1ee3d967c8328a6c718dfaac65a983f437ce1/plugins/common/http/config.go#L51). So this client is quite inpatient which explains why we get context deadline exceeded
very soon/often.
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
Updated by okurz over 1 year ago
- Subject changed from [alert][grafana] DatasourceNoData HTTP Response alert to [alert][grafana] Handle various problems with slow monitoring https requests and such (was: DatasourceNoData HTTP Response alert) size:M
- Description updated (diff)
Updated by mkittler over 1 year ago
We're actually using inputs.http_response
here (and not inputs.http
). However, conclusion from my previous comment still holds. This MR should avoid the empty graph sections: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/848
Updated by okurz over 1 year ago
Updated by mkittler over 1 year ago
- Status changed from Feedback to Resolved
The graph hasn't shown any gaps since the MR has been merged. So I guess this is good enough. (And the alert itself is suppressed anyways via https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/844.)