action #10966
closedaction #10960: current performance problems on o.s.d
SLES Build1204 overview page times out with "502 Bad Gateway"
0%
Description
observation¶
Accessing https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=1204&groupid=25 yields an error "502" after some time.
from /var/logs/openqa on o.s.d:
[Sun Feb 28 09:37:36 2016][26145][debug] [DBIx debug] Took 8.41389394 seconds executed: SELECT me.id, me.slug, me.result_dir, me.state, me.priority, me.result, me.worker_id, me.test, me.clone_id, me.retry_avbl, me.backend, me.backend_info, me.group_id, me.t_started, me.t_finished, me.t_created, me.t_updated, settings.id, settings.key, settings.value, settings.job_id, settings.t_created, settings.t_updated, parents.child_job_id, parents.parent_job_id, parents.dependency, children.child_job_id, children.parent_job_id, children.dependency FROM jobs me LEFT JOIN job_settings settings ON settings.job_id = me.id LEFT JOIN job_dependencies parents ON parents.child_job_id = me.id LEFT JOIN job_dependencies children ON children.parent_job_id = me.id WHERE ( ( me.clone_id IS NULL AND me.group_id = ? AND me.id IN ( SELECT me.job_id FROM job_settings me LEFT JOIN job_settings siblings ON siblings.job_id = me.job_id LEFT JOIN job_settings siblings_2 ON siblings_2.job_id = me.job_id WHERE ( ( ( me.key = ? AND me.value = ? ) AND ( siblings.key = ? AND siblings.value = ? ) AND ( siblings_2.key = ? AND siblings_2.value = ? ) ) ) ) ) ) ORDER BY me.id DESC: '25', 'VERSION', '12-SP2', 'DISTRI', 'sle', 'BUILD', '1204'.
Steps to reproduce¶
- access https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=1204&groupid=25
- wait until page times out
Problem¶
I see the very long query time (see observation) but no other obvious error message in the logs. I am trying to reproduce this locally and it looks like this is possible. The very long query from above also appears (twice) plus a subsequent
Took 12.04192710 seconds executed: SELECT me.job_id, me.result, me.soft_failure, COUNT( id ) FROM job_modules me WHERE <super_long_job_list>
and finally the request is finished without timeout when run locally after 111.849923s. Remember to also see parent task #10960. The slow rendering is not because of any change since 97b8d92 but the database changed its layout or content since then or something in the data of new builds is very different from before, e.g. many more jobs for external reasons.
Updated by okurz about 8 years ago
PR ready: openQA gh#583
Main time is now spent in one query on job_settings to find the initial list of jobs. It is still 4s for this specific build but I would say this is accceptable as other builds are not affected this severe. E.g. Build1205 finishes in about 1s but still the PR is an improvement.