Project

General

Profile

Actions

action #10966

closed

action #10960: current performance problems on o.s.d

SLES Build1204 overview page times out with "502 Bad Gateway"

Added by okurz about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
Start date:
2016-02-28
Due date:
% Done:

0%

Estimated time:

Description

observation

Accessing https://openqa.suse.de/tests/overview?distri=sle&version=12-SP2&build=1204&groupid=25 yields an error "502" after some time.

from /var/logs/openqa on o.s.d:

[Sun Feb 28 09:37:36 2016][26145][debug] [DBIx debug] Took 8.41389394 seconds executed: SELECT me.id, me.slug, me.result_dir, me.state, me.priority, me.result, me.worker_id, me.test, me.clone_id, me.retry_avbl, me.backend, me.backend_info, me.group_id, me.t_started, me.t_finished, me.t_created, me.t_updated, settings.id, settings.key, settings.value, settings.job_id, settings.t_created, settings.t_updated, parents.child_job_id, parents.parent_job_id, parents.dependency, children.child_job_id, children.parent_job_id, children.dependency FROM jobs me LEFT JOIN job_settings settings ON settings.job_id = me.id LEFT JOIN job_dependencies parents ON parents.child_job_id = me.id LEFT JOIN job_dependencies children ON children.parent_job_id = me.id WHERE ( ( me.clone_id IS NULL AND me.group_id = ? AND me.id IN ( SELECT me.job_id FROM job_settings me LEFT JOIN job_settings siblings ON siblings.job_id = me.job_id LEFT JOIN job_settings siblings_2 ON siblings_2.job_id = me.job_id WHERE ( ( ( me.key = ? AND me.value = ? ) AND ( siblings.key = ? AND siblings.value = ? ) AND ( siblings_2.key = ? AND siblings_2.value = ? ) ) ) ) ) ) ORDER BY me.id DESC: '25', 'VERSION', '12-SP2', 'DISTRI', 'sle', 'BUILD', '1204'.

Steps to reproduce

Problem

I see the very long query time (see observation) but no other obvious error message in the logs. I am trying to reproduce this locally and it looks like this is possible. The very long query from above also appears (twice) plus a subsequent

Took 12.04192710 seconds executed: SELECT me.job_id, me.result, me.soft_failure, COUNT( id ) FROM job_modules me WHERE <super_long_job_list>

and finally the request is finished without timeout when run locally after 111.849923s. Remember to also see parent task #10960. The slow rendering is not because of any change since 97b8d92 but the database changed its layout or content since then or something in the data of new builds is very different from before, e.g. many more jobs for external reasons.

Actions #1

Updated by okurz about 8 years ago

  • Description updated (diff)
Actions #2

Updated by okurz about 8 years ago

PR ready: openQA gh#583

Main time is now spent in one query on job_settings to find the initial list of jobs. It is still 4s for this specific build but I would say this is accceptable as other builds are not affected this severe. E.g. Build1205 finishes in about 1s but still the PR is an improvement.

Actions #3

Updated by okurz about 8 years ago

  • Status changed from New to Feedback
Actions #4

Updated by okurz about 8 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF