Project

General

Profile

Actions

action #138287

closed

petrol sometimes take a long time to respond/render http://localhost:9530/influxdb/minion

Added by nicksinger 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

Sometimes pipelines (e.g. https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1915033) fail with:

2023-10-19T13:14:13Z E! [inputs.http] Error in plugin: [url=http://localhost:9530/influxdb/minion]: Get "http://localhost:9530/influxdb/minion": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

It seems like the endpoint on that host sometimes takes a long time to respond:

petrol:~ # time curl http://localhost:9530/influxdb/minion
openqa_minion_jobs,url=http://localhost:9530 active=0i,delayed=0i,failed=19i,inactive=0i
openqa_minion_workers,url=http://localhost:9530 active=0i,inactive=1i,registered=1i
openqa_download_count,url=http://localhost:9530 count=0i
openqa_download_rate,url=http://localhost:9530 bytes=28359186i

real    0m0.008s
user    0m0.006s
sys 0m0.000s
petrol:~ # time curl http://localhost:9530/influxdb/minion
openqa_minion_jobs,url=http://localhost:9530 active=0i,delayed=0i,failed=19i,inactive=0i
openqa_minion_workers,url=http://localhost:9530 active=0i,inactive=1i,registered=1i
openqa_download_count,url=http://localhost:9530 count=0i
openqa_download_rate,url=http://localhost:9530 bytes=28359186i

real    0m0.008s
user    0m0.006s
sys 0m0.000s
petrol:~ # time curl http://localhost:9530/influxdb/minion
openqa_minion_jobs,url=http://localhost:9530 active=0i,delayed=0i,failed=19i,inactive=1i
openqa_minion_workers,url=http://localhost:9530 active=0i,inactive=1i,registered=1i
openqa_download_count,url=http://localhost:9530 count=0i
openqa_download_rate,url=http://localhost:9530 bytes=28359186i

real    0m6.242s
user    0m0.003s
sys 0m0.003s
petrol:~ # time curl http://localhost:9530/influxdb/minion
openqa_minion_jobs,url=http://localhost:9530 active=1i,delayed=0i,failed=19i,inactive=0i
openqa_minion_workers,url=http://localhost:9530 active=1i,inactive=0i,registered=1i
openqa_download_count,url=http://localhost:9530 count=1i
openqa_download_rate,url=http://localhost:9530 bytes=28359186i

real    0m11.547s
user    0m0.006s
sys 0m0.000s

Reproducible

Not sure what causes the long response times but I could easily reproduce it by running time curl http://localhost:9530/influxdb/minion a couple of times.

Expected result

The route should be quite snappy and not that slow. At the very least, if we cannot understand or fix the underlying problem our pipelines should not fail because of this.

Suggestions

  • Understand why that api endpoint needs so long to respond on only that host
  • Bump curl timeouts in our telegraf config
Actions #1

Updated by okurz 6 months ago

  • Tags set to telegraf, minion
  • Project changed from openQA Tests to openQA Project
  • Due date deleted (2023-10-24)
  • Category set to Feature requests
  • Start date deleted (2023-09-20)
Actions #2

Updated by mkittler 5 months ago

  • Assignee set to mkittler
Actions #3

Updated by mkittler 5 months ago

  • Status changed from New to In Progress

I could reproduce a 4 second delay on the 3rd attempt. Of course 4 seconds is not that much considering the system is seriously busy (all worker slots are utilized). I would suspect that the SQLite database is busy (e.g. a write operation is blocking and/or the disk is generally busy).

Actions #4

Updated by openqa_review 5 months ago

  • Due date set to 2023-11-28

Setting due date based on mean cycle time of SUSE QE Tools

Actions #5

Updated by mkittler 5 months ago

  • Status changed from In Progress to Feedback

Besides the 4 second delay on one request yesterday I couldn't reproduce the problem anymore at all. I suppose it can nevertheless still happen if a worker is very busy so I created a MR to increase the timeout: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1043

Actions #6

Updated by okurz 5 months ago

  • Due date deleted (2023-11-28)
  • Status changed from Feedback to Resolved

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1043 merged. As you stated that you can't reproduce the problem and also because I tried now and could not reproduce the problem I'd say we can resolve right away.

Actions

Also available in: Atom PDF