Actions
action #138287
closedpetrol sometimes take a long time to respond/render http://localhost:9530/influxdb/minion
Start date:
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Sometimes pipelines (e.g. https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1915033) fail with:
2023-10-19T13:14:13Z E! [inputs.http] Error in plugin: [url=http://localhost:9530/influxdb/minion]: Get "http://localhost:9530/influxdb/minion": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
It seems like the endpoint on that host sometimes takes a long time to respond:
petrol:~ # time curl http://localhost:9530/influxdb/minion
openqa_minion_jobs,url=http://localhost:9530 active=0i,delayed=0i,failed=19i,inactive=0i
openqa_minion_workers,url=http://localhost:9530 active=0i,inactive=1i,registered=1i
openqa_download_count,url=http://localhost:9530 count=0i
openqa_download_rate,url=http://localhost:9530 bytes=28359186i
real 0m0.008s
user 0m0.006s
sys 0m0.000s
petrol:~ # time curl http://localhost:9530/influxdb/minion
openqa_minion_jobs,url=http://localhost:9530 active=0i,delayed=0i,failed=19i,inactive=0i
openqa_minion_workers,url=http://localhost:9530 active=0i,inactive=1i,registered=1i
openqa_download_count,url=http://localhost:9530 count=0i
openqa_download_rate,url=http://localhost:9530 bytes=28359186i
real 0m0.008s
user 0m0.006s
sys 0m0.000s
petrol:~ # time curl http://localhost:9530/influxdb/minion
openqa_minion_jobs,url=http://localhost:9530 active=0i,delayed=0i,failed=19i,inactive=1i
openqa_minion_workers,url=http://localhost:9530 active=0i,inactive=1i,registered=1i
openqa_download_count,url=http://localhost:9530 count=0i
openqa_download_rate,url=http://localhost:9530 bytes=28359186i
real 0m6.242s
user 0m0.003s
sys 0m0.003s
petrol:~ # time curl http://localhost:9530/influxdb/minion
openqa_minion_jobs,url=http://localhost:9530 active=1i,delayed=0i,failed=19i,inactive=0i
openqa_minion_workers,url=http://localhost:9530 active=1i,inactive=0i,registered=1i
openqa_download_count,url=http://localhost:9530 count=1i
openqa_download_rate,url=http://localhost:9530 bytes=28359186i
real 0m11.547s
user 0m0.006s
sys 0m0.000s
Reproducible¶
Not sure what causes the long response times but I could easily reproduce it by running time curl http://localhost:9530/influxdb/minion
a couple of times.
Expected result¶
The route should be quite snappy and not that slow. At the very least, if we cannot understand or fix the underlying problem our pipelines should not fail because of this.
Suggestions¶
- Understand why that api endpoint needs so long to respond on only that host
- Bump curl timeouts in our telegraf config
Actions