Project

General

Profile

action #93925

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #92854: [epic] limit overload of openQA webUI by heavy requests

Optimize SQL on /tests/overview

Added by tinita 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-06-12
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

For /tests/overview pages with a lot of jobs, retrieving data from our database is quite inefficient.
For every job several SQL statements are fired.
Some of them can be combined into one statement.
Also in several statements, too much columns are selected, where we need only a few. That includes costly DateTime columns.

Identify statements which can be combined, combine them and measure performance.


Related issues

Related to openQA Project - action #94111: Optimize /api/v1/jobsNew2021-06-16

History

#1 Updated by okurz 3 months ago

  • Priority changed from Normal to Low

Let's see that as something that can help and is certainly related but can't prevent an "overload" so reducing prio to "Low". But besides that no further impact, keep on the good work :)

#2 Updated by kraih 3 months ago

Might also be possible to replace DateTime with a less heavy module like DateTime::Tiny in some cases.

#3 Updated by mkittler 3 months ago

Also take note of #92854#note-13 regarding the DateTime overhead.

#4 Updated by tinita 3 months ago

I made the following config changes on o3:

# /etc/apache2/global.conf
CustomLog /space/logs/apache2/access_log combined_with_time
# /etc/apache2/vhosts.d/openqa.conf
CustomLog /space/logs/apache2/openqa.access_log combined_with_time
# /etc/apache2/mod_log_config.conf
LogFormat "%h %l %u %t \"%r\" %>s %b \
\"%{Referer}i\" \"%{User-Agent}i\" %{ms}T"              combined_with_time

Added the files to /etc/logrotate.d/apache2

This way we can compare request times before and after a deployment.

#5 Updated by tinita 3 months ago

I did the same changes on osd (except access_log was already enabled), though I had to touch different files.

I replaced combined with combined_with_time in /etc/apache2/global.conf and added it to /etc/apache2/mod_log_config.conf

I added combined_with_time to /etc/sysconfig/apache2 and reloaded apache, which generated /etc/apache2/sysconfig.d/global.conf.

Now we can see the time in /var/log/apache2/access_log

#6 Updated by tinita 3 months ago

  • Status changed from In Progress to Feedback

#7 Updated by tinita 3 months ago

#8 Updated by tinita 3 months ago

  • Status changed from Feedback to In Progress

I will gather some statistics from the logs in the next days and report them here.
So far on o3 the average Tumbleweed build page took 46% of the time before the deployment.

#9 Updated by tinita 3 months ago

  • Status changed from In Progress to Resolved

Unfortunately I didn't look to closely on the load average when doing my test queries before the rollout.

I did the test queries today when the load was not too high (between 10-15) on osd.

# Yast
% perl -nwE'if (m{ /tests/overview\?distri=sle&version=15-SP3&build=187.1&groupid=129 } and m{(\d+)$}) { $count++; $sum += $1 } END { say sprintf "count: %d | sum: %d ms | avg: %.2f ms", $count, $sum, $sum/$count }' access_log-20210617
count: 17 | sum: 127184 ms | avg: 7481.41 ms
% perl -nwE'if (m{ /tests/overview\?distri=sle&version=15-SP3&build=187.1&groupid=129 } and m{(\d+)$}) { $count++; $sum += $1 } END { say sprintf "count: %d | sum: %d ms | avg: %.2f ms", $count, $sum, $sum/$count }' access_log-today
count: 46 | sum: 142907 ms | avg: 3106.67 ms
% perl -wE'say 3106.67/7481.41'
0.415251937803168

# HA
% perl -nwE'if (m{ /tests/overview\?distri=sle&version=15-SP3&build=178.1&groupid=143 } and m{(\d+)$}) { $count++; $sum += $1 } END { say sprintf "count: %d | sum: %d ms | avg: %.2f ms", $count, $sum, $sum/$count }' access_log-20210617                                                      count: 20 | sum: 147886 ms | avg: 7394.30 ms
% perl -nwE'if (m{ /tests/overview\?distri=sle&version=15-SP3&build=178.1&groupid=143 } and m{(\d+)$}) { $count++; $sum += $1 } END { say sprintf "count: %d | sum: %d ms | avg: %.2f ms", $count, $sum, $sum/$count }' access_log-today
count: 40 | sum: 153192 ms | avg: 3829.80 ms
% perl -wE'say 3829.80/7394.30'
0.517939493934517

So for those relatively big groups the average load time decreased to 42% (Yast) / 52% (HA).

#10 Updated by okurz 3 months ago

A significant and important improvement. Thanks!

#11 Updated by okurz 3 months ago

  • Related to action #94354: Optimize /dashboard_build_results and /group_overview/* pages added

#12 Updated by okurz 3 months ago

  • Related to deleted (action #94354: Optimize /dashboard_build_results and /group_overview/* pages)

Also available in: Atom PDF