action #55904: /status updates are too heavy with external results - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #55904

open

/status updates are too heavy with external results

Added by coolo almost 6 years ago. Updated about 5 years ago.

Status:

New

Priority:

Low

Assignee:

Category:

Feature requests

Target version:

QA (public) - future

Start date:

2019-08-23

Due date:

% Done:

Estimated time:

Description

As we generate a lot of modules it can take up to 30seconds to answer. This needs to be improved - we should aim for smaller requests that are managable.

You can execute this on openqa-monitor:

epoch=$(date +%s); epoch=$((epoch-7000)); influx -database telegraf -execute "select response_time_us,request from apache_log where response_time_us > 19000000 and request =~ /status/ and time>$epoch*1000000000"

Will give you e.g.

1566573983000000000 30590665         /api/v1/jobs/3281950/status
1566573984000000000 29956535         /api/v1/jobs/3282402/status
1566573985000000000 28671861         /api/v1/jobs/3282393/status
1566573986000000000 26500427         /api/v1/jobs/3281940/status
1566573987000000000 26421768         /api/v1/jobs/3281944/status
1566573988000000000 25935256         /api/v1/jobs/3281938/status
1566573988001000000 24334786         /api/v1/jobs/3281918/status
1566573989000000000 25448723         /api/v1/jobs/3282232/status
1566573989001000000 23724981         /api/v1/jobs/3277369/status
1566573989002000000 22939661         /api/v1/jobs/3282169/status
1566573990000000000 21872689         /api/v1/jobs/3282156/status
1566573991000000000 21818731         /api/v1/jobs/3282400/status
1566573993000000000 19485082         /api/v1/jobs/3282281/status
1566574127000000000 33826524         /api/v1/jobs/3281841/status
1566574188000000000 30708568         /api/v1/jobs/3281892/status
1566575262000000000 32902187         /api/v1/jobs/3281998/status
1566575288000000000 21759388         /api/v1/jobs/3282002/status
1566575503000000000 33676646         /api/v1/jobs/3282028/status
1566575639000000000 25030512         /api/v1/jobs/3282185/status
1566575684000000000 24631893         /api/v1/jobs/3282039/status
1566575857000000000 40510643         /api/v1/jobs/3282207/status
1566575933000000000 51457756         /api/v1/jobs/3281879/status
1566576361000000000 24472420         /api/v1/jobs/3282284/status
1566576676000000000 30885742         /api/v1/jobs/3282306/status

I.e. tons of requests that are very slow - and if you check e.g. https://openqa.suse.de/tests/3282284 that it inserts tons of job modules.

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by kraih over 5 years ago

Assignee set to kraih

This is related to a bug with sometimes duplicated records in the job_modules table i'm investigating. So i'll take a look at this too.

Actions

Copy link

Updated by kraih over 5 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by okurz over 5 years ago

Related to action #56591: Improve feedback for jobs in "Uploading" added

Actions

Copy link

Updated by okurz over 5 years ago

Related to action #55238: jobs with high amount of log files, thumbnails, test results are incompleted but the job continues with upload attempts added

Actions

Copy link

Updated by kraih over 5 years ago

The new concurrency limit for the webui server should make this much less of a problem. https://github.com/os-autoinst/openQA/pull/2378

Actions

Copy link

Updated by kraih over 5 years ago

Priority changed from High to Normal

Actions

Copy link

Updated by kraih over 5 years ago

I've done a lot of profiling and the JSON data structures we send for status updates really don't look too bad at all. The primary bottleneck we are fighting with is inefficient DBIx::Class queries and abstraction. Nobody is really at fault here though, it's just the nature of ORMs. I've done some optimizations that should gain us about 30% better performance for the slowest queries (those including the initial test_order), but without custom SQL and batch inserts i don't think there is much we can do. https://github.com/os-autoinst/openQA/pull/2390 Splitting up the test_order update into multiple requests would add quite a bit of complexity and many new opportunities for things to go wrong, especially regarding timing errors in the worker. So, i don't think it's actually a good idea to go into that direction.

Given that those 30s+ requests are only performing a loop of inserts into a Postgres table (with some cascading queries triggered by DBIx::Class for module stats), there is quite a bit of performance to be gained from using custom SQL. The code would just get quite a bit uglier. But it is the tradeoff i would take if we really need more performance improvements.

Actions

Copy link

Updated by kraih over 5 years ago

Difficulty changed from medium to hard

Actions

Copy link

Updated by kraih over 5 years ago

Status changed from In Progress to Feedback

Actions

Copy link

#10

Updated by livdywan over 5 years ago

Target version changed from Ready to Current Sprint

Actions

Copy link

#11

Updated by coolo over 5 years ago

Status changed from Feedback to New
Assignee deleted (~~kraih~~)
Target version deleted (~~Current Sprint~~)

We need to check if it's still a problem

Actions

Copy link

#12

Updated by okurz over 5 years ago

Category changed from Organisational to Feature requests

Still a problem. The command mentioned in the description still shows some long running requests, e.g.

name: apache_log
time                response_time_us request
----                ---------------- -------
1582619724000000000 24188469         /api/v1/jobs/3920257/status
1582621459000000000 19464431         /api/v1/jobs/3920605/status
1582621697000000000 25021240         /api/v1/jobs/3920257/status
1582623514000000000 19915238         /api/v1/jobs/3920277/status
1582623700000000000 21386320         /api/v1/jobs/3920277/status
1582626223000000000 25691448         /api/v1/jobs/3920277/status

where "3920277" is not even one of the infamous ltp test cases. These are migration test cases. https://openqa.suse.de/tests/3920605 is "ltp_openposix" so related to #55283

Actions

Copy link

#13

Updated by okurz about 5 years ago

Priority changed from Normal to Low
Target version set to future

with https://github.com/os-autoinst/openQA/pull/2932 this is less important as the job details show up regardless of heavy results. The root cause problem persists though.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #55904

/status updates are too heavy with external results

Updated by kraih over 5 years ago

Updated by kraih over 5 years ago

Updated by okurz over 5 years ago

Updated by okurz over 5 years ago

Updated by kraih over 5 years ago

Updated by kraih over 5 years ago

Updated by kraih over 5 years ago

Updated by kraih over 5 years ago

Updated by kraih over 5 years ago

Updated by livdywan over 5 years ago

Updated by coolo over 5 years ago

Updated by okurz over 5 years ago

Updated by okurz about 5 years ago