action #10960
closedcurrent performance problems on o.s.d
100%
Description
observation¶
Starting with early Friday, 2016-02-26, a performance regression on o.s.d was notified. I could reproduce what looks like the issue at hand locally. Enabling SQL query debugging I see quers for "job_modules" and "job_settings" taking long
steps to reproduce¶
- load database dump 2016-02-26 from o.s.d into local PostgreSQL database (e.g.
sudo -u geekotest createdb openqa_suse_de_2016-02-26 && sudo -u geekotest pg_restore -d openqa_suse_de_2016-02-26 --role=geekotest ~/local/os-autoinst/openQA/local/SQL-DUMPS/openqa.suse.de/2016-02-26.dump
) - configure local openQA to load this database, e.g.
$ mkdir -p local && cat - > local/stage_openqa_suse_de_2016-02-26/database.ini << EOF
[stage]
dsn = dbi:Pg:dbname=openqa_suse_de_2016-02-26
user = geekotest
EOF
- load openQA with this database and SQL queries, e.g.
$ time sudo -u geekotest OPENQA_SQL_DEBUG=1 OPENQA_CONFIG=local/stage_openqa_suse_de_2016-02-26 OPENQA_DATABASE=stage script/openqa get '/tests/overview?distri=sle&version=12-SP2&build=1201&groupid=25'
- observe super long loading time and slow queries
problem¶
The overall processing time is way to long, main waiting time from queries. My (okurz) local tests yield 15s for loading the index page with database dump from 2016-02-26 whereas it was around 2s for 2016-02-25.
H1: REJECTED - A machine specific problem (DONE: REJECTED, see E1-1, could be reproduced locally)
H2: REJECTED - Recent openQA changes introduced a performance regression (DONE: REJECTED, see #10960#note-4 E2-1)
H3: ACCEPTED - The database got weird because of recent openQA changes
H3.1: REJECTED - As more comments are used now because of okurz's changes the parsing gets slow (DONE: REJECTED, see #10960#note-6)
H3.2: ACCEPTED - Other changes cause the slowdown (DONE: ACCEPTED, see #10960#note-6)
H3.2.1: ACCEPTED - Something caused many more jobs considered for job settings queries to appear recently (DONE: ACCEPTED, see #10960#note-7)
H3.2.2: REJECTED - postgreSQL decides on its own that it should consider more jobs in job settings queries trying to do right but failing (DONE: REJECTED, see #10960#note-7)
H4: REJECTED - The database got weird due to other effects (DONE: REJECTED, see E4-1)
suggestion¶
E1-1: DONE: Try to reproduce locally -> could be reproduced
E2-1: DONE: Crosscheck with older version, e.g. the one used before last upgrade on o.s.d, if confirmed, git bisect to find culprit -> old version (97b8d9238aa918493883199e4da88eef3e578797) does not show an improvement in performance -> database f*ed
E3-1: DONE: Run an older database dump with recent openQA -> older database still fine (see #10960#note-4)
E3.2.1-1: DONE: ask others, don't know what could have caused this -> see #10960#note-7
E3.2.2-1: DONE: ask @coolo as he mentioned something like this recently -> see #10960#note-7
E4-1: DONE: @waitfor E2-1+E3-1, if both fail, accept H4 -> E2-1 FAIL, E3-1 SUCCESS