Project

General

Profile

action #10960

Updated by okurz about 8 years ago

## observation 
 Starting with early Friday, 2016-02-26, a performance regression on o.s.d was notified. I could reproduce what looks like the issue at hand locally. Enabling SQL query debugging I see quers for "job_modules" and "job_settings" taking long 

 ## steps to reproduce 
 * load database dump 2016-02-26 from o.s.d into local PostgreSQL database (e.g. `sudo -u geekotest createdb openqa_suse_de_2016-02-26 && sudo -u geekotest pg_restore -d openqa_suse_de_2016-02-26 --role=geekotest ~/local/os-autoinst/openQA/local/SQL-DUMPS/openqa.suse.de/2016-02-26.dump`) 
 * configure local openQA to load this database, e.g. 

 ``` 
 $ mkdir -p local && cat - > local/stage_openqa_suse_de_2016-02-26/database.ini << EOF 
 [stage] 
 dsn = dbi:Pg:dbname=openqa_suse_de_2016-02-26 
 user = geekotest 
 EOF 
 ``` 

 * load openQA with this database and SQL queries, e.g. 

 ``` 
 $ time sudo -u geekotest OPENQA_SQL_DEBUG=1 OPENQA_CONFIG=local/stage_openqa_suse_de_2016-02-26 OPENQA_DATABASE=stage script/openqa get '/tests/overview?distri=sle&version=12-SP2&build=1201&groupid=25' 
 ``` 
 * observe super long loading time and slow queries 


 ## problem 
 The overall processing time is way to long, main waiting time from queries. My (okurz) local tests yield 15s for loading the index page with database dump from 2016-02-26 whereas it was around 2s for 2016-02-25. 

 H1: *REJECTED* - A machine specific problem (DONE: REJECTED, see E1-1, could be reproduced locally) 
 H2: *REJECTED* - Recent openQA changes introduced a performance regression (DONE: REJECTED, see #10960#note-4 #10960#4 E2-1) 
 H3: *ACCEPTED* - The database got weird because of recent openQA changes 
 H3.1: *REJECTED* - As more comments are used now because of okurz's changes the parsing gets slow (DONE: REJECTED, see #10960#note-6) #10960#6) 
 H3.2: *ACCEPTED* - Other changes cause the slowdown (DONE: ACCEPTED, see #10960#note-6) #10960#6) 
 H3.2.1: *ACCEPTED* - Something caused many more jobs considered for job settings queries to appear recently (DONE: ACCEPTED, see #10960#note-7) 
 H3.2.2: *REJECTED* - postgreSQL decides on its own that it should consider more jobs in job settings queries trying to do right but failing (DONE: REJECTED, see #10960#note-7) 
 H4: *REJECTED* - The database got weird due to other effects (DONE: REJECTED, see E4-1) 

 ## suggestion 

 E1-1: DONE: Try to reproduce locally -> could be reproduced 
 E2-1: DONE: Crosscheck with older version, e.g. the one used before last upgrade on o.s.d, if confirmed, git bisect to find culprit -> old version (97b8d9238aa918493883199e4da88eef3e578797) does not show an improvement in performance -> database f*ed 
 E3-1: DONE: Run an older database dump with recent openQA -> older database still fine (see #10960#note-4) #10960#4) 
 E3.2.1-1: DONE: ~~ask TODO ask others, don't know what could have caused this~~ -> see #10960#note-7 this 
 E3.2.2-1: DONE: ~~ask TODO ask @coolo as he mentioned something like this recently~~ -> see #10960#note-7 recently 
 E4-1: DONE: @waitfor E2-1+E3-1, if both fail, accept H4 -> E2-1 FAIL, E3-1 SUCCESS

Back