action #93925
closed
coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #92854: [epic] limit overload of openQA webUI by heavy requests
Optimize SQL on /tests/overview
Added by tinita over 3 years ago.
Updated over 3 years ago.
Category:
Feature requests
Description
For /tests/overview
pages with a lot of jobs, retrieving data from our database is quite inefficient.
For every job several SQL statements are fired.
Some of them can be combined into one statement.
Also in several statements, too much columns are selected, where we need only a few. That includes costly DateTime columns.
Identify statements which can be combined, combine them and measure performance.
- Priority changed from Normal to Low
Let's see that as something that can help and is certainly related but can't prevent an "overload" so reducing prio to "Low". But besides that no further impact, keep on the good work :)
Might also be possible to replace DateTime
with a less heavy module like DateTime::Tiny
in some cases.
I made the following config changes on o3:
# /etc/apache2/global.conf
CustomLog /space/logs/apache2/access_log combined_with_time
# /etc/apache2/vhosts.d/openqa.conf
CustomLog /space/logs/apache2/openqa.access_log combined_with_time
# /etc/apache2/mod_log_config.conf
LogFormat "%h %l %u %t \"%r\" %>s %b \
\"%{Referer}i\" \"%{User-Agent}i\" %{ms}T" combined_with_time
Added the files to /etc/logrotate.d/apache2
This way we can compare request times before and after a deployment.
I did the same changes on osd (except access_log was already enabled), though I had to touch different files.
I replaced combined
with combined_with_time
in /etc/apache2/global.conf
and added it to /etc/apache2/mod_log_config.conf
I added combined_with_time
to /etc/sysconfig/apache2
and reloaded apache, which generated /etc/apache2/sysconfig.d/global.conf
.
Now we can see the time in /var/log/apache2/access_log
- Status changed from In Progress to Feedback
- Status changed from Feedback to In Progress
I will gather some statistics from the logs in the next days and report them here.
So far on o3 the average Tumbleweed build page took 46% of the time before the deployment.
- Status changed from In Progress to Resolved
Unfortunately I didn't look to closely on the load average when doing my test queries before the rollout.
I did the test queries today when the load was not too high (between 10-15) on osd.
# Yast
% perl -nwE'if (m{ /tests/overview\?distri=sle&version=15-SP3&build=187.1&groupid=129 } and m{(\d+)$}) { $count++; $sum += $1 } END { say sprintf "count: %d | sum: %d ms | avg: %.2f ms", $count, $sum, $sum/$count }' access_log-20210617
count: 17 | sum: 127184 ms | avg: 7481.41 ms
% perl -nwE'if (m{ /tests/overview\?distri=sle&version=15-SP3&build=187.1&groupid=129 } and m{(\d+)$}) { $count++; $sum += $1 } END { say sprintf "count: %d | sum: %d ms | avg: %.2f ms", $count, $sum, $sum/$count }' access_log-today
count: 46 | sum: 142907 ms | avg: 3106.67 ms
% perl -wE'say 3106.67/7481.41'
0.415251937803168
# HA
% perl -nwE'if (m{ /tests/overview\?distri=sle&version=15-SP3&build=178.1&groupid=143 } and m{(\d+)$}) { $count++; $sum += $1 } END { say sprintf "count: %d | sum: %d ms | avg: %.2f ms", $count, $sum, $sum/$count }' access_log-20210617 count: 20 | sum: 147886 ms | avg: 7394.30 ms
% perl -nwE'if (m{ /tests/overview\?distri=sle&version=15-SP3&build=178.1&groupid=143 } and m{(\d+)$}) { $count++; $sum += $1 } END { say sprintf "count: %d | sum: %d ms | avg: %.2f ms", $count, $sum, $sum/$count }' access_log-today
count: 40 | sum: 153192 ms | avg: 3829.80 ms
% perl -wE'say 3829.80/7394.30'
0.517939493934517
So for those relatively big groups the average load time decreased to 42% (Yast) / 52% (HA).
A significant and important improvement. Thanks!
- Related to action #94354: Optimize /dashboard_build_results and /group_overview/* pages added
- Related to deleted (action #94354: Optimize /dashboard_build_results and /group_overview/* pages)
Also available in: Atom
PDF