action #10960
Updated by okurz about 8 years ago
## observation Starting with early Friday, 2016-02-26, a performance regression on o.s.d was notified. I could reproduce what looks like the issue at hand locally. Enabling SQL query debugging I see quers for "job_modules" and "job_settings" taking long ## steps to reproduce * load database dump 2016-02-26 from o.s.d into local PostgreSQL database (e.g. `sudo -u geekotest createdb openqa_suse_de_2016-02-26 && sudo -u geekotest pg_restore -d openqa_suse_de_2016-02-26 --role=geekotest ~/local/os-autoinst/openQA/local/SQL-DUMPS/openqa.suse.de/2016-02-26.dump`) * configure local openQA to load this database, e.g. ``` $ mkdir -p local && cat - > local/stage_openqa_suse_de_2016-02-26/database.ini << EOF [stage] dsn = dbi:Pg:dbname=openqa_suse_de_2016-02-26 user = geekotest EOF ``` * load openQA with this database and SQL queries, e.g. ``` $ time sudo -u geekotest OPENQA_SQL_DEBUG=1 OPENQA_CONFIG=local/stage_openqa_suse_de_2016-02-26 OPENQA_DATABASE=stage script/openqa get '/tests/overview?distri=sle&version=12-SP2&build=1201&groupid=25' ``` * observe super long loading time and slow queries ## problem The overall processing time is way to long, main waiting time from queries. My (okurz) local tests yield 15s for loading the index page with database dump from 2016-02-26 whereas it was around 2s for 2016-02-25. H1: *REJECTED* - A machine specific problem (DONE: REJECTED, see E1-1, could be reproduced locally) H2: *REJECTED* - Recent openQA changes introduced a performance regression (DONE: REJECTED, see #10960#note-4 #10960#4 E2-1) H3: *ACCEPTED* - The database got weird because of recent openQA changes H3.1: *REJECTED* - As more comments are used now because of okurz's changes the parsing gets slow (DONE: REJECTED, see #10960#note-6) #10960#6) H3.2: *ACCEPTED* - Other changes cause the slowdown (DONE: ACCEPTED, see #10960#note-6) #10960#6) H3.2.1: *ACCEPTED* - Something caused many more jobs considered for job settings queries to appear recently (DONE: ACCEPTED, see #10960#note-7) H3.2.2: *REJECTED* - postgreSQL decides on its own that it should consider more jobs in job settings queries trying to do right but failing (DONE: REJECTED, see #10960#note-7) H4: *REJECTED* - The database got weird due to other effects (DONE: REJECTED, see E4-1) ## suggestion E1-1: DONE: Try to reproduce locally -> could be reproduced E2-1: DONE: Crosscheck with older version, e.g. the one used before last upgrade on o.s.d, if confirmed, git bisect to find culprit -> old version (97b8d9238aa918493883199e4da88eef3e578797) does not show an improvement in performance -> database f*ed E3-1: DONE: Run an older database dump with recent openQA -> older database still fine (see #10960#note-4) #10960#4) E3.2.1-1: DONE: ~~ask TODO ask others, don't know what could have caused this~~ -> see #10960#note-7 this E3.2.2-1: DONE: ~~ask TODO ask @coolo as he mentioned something like this recently~~ -> see #10960#note-7 recently E4-1: DONE: @waitfor E2-1+E3-1, if both fail, accept H4 -> E2-1 FAIL, E3-1 SUCCESS