action #55907

Deleting job results is too slow

Added by coolo 6 months ago. Updated 6 months ago.

Status:ResolvedStart date:23/08/2019
Priority:HighDue date:
Assignee:mkittler% Done:

0%

Category:Concrete Bugs
Target version:Done
Difficulty:
Duration:

Description

The main bottleneck is checking screenshots that are no longer attached.

[2019-08-23T18:55:03.0649 CEST] [debug] [pid:2164] [DBIC] Took 0.68212500 seconds: SELECT me.id, me.filename, me.t_created FROM screenshots me LEFT OUTER JOIN screenshot_links links_outer ON links_outer.screenshot_id = me.id WHERE ( id IN ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ) ) GROUP BY me.id HAVING COUNT(links_outer.job_id) = 0: '99830535', '191526292', '99830500', '99830499', '99830536', '156209408', '17779700', '185486638', '185486636', '106285578', '190738955', '191527359', '191526710', '191526711', '185107267', '191526708', '152788113', '105046651', '105046658', '176251808', '105047637', '105046831', '176279984', '185486365', '186074859', '185486363', '185486366', '191527327', '106285578', '176245953', '130324508', '191535899', '191526920', '152576923', '152607891', '185107053', '155173141', '191532253', '133347265', '191532253', '133347264', '191532705', '191532702', '191532700', '105083401', '191532706', '191532704', '148183313', '105087464', '191532701', '148163978', '191532703', '190739946', '140809663', '52217985', '56783604', '56783605', '105046651', '191536177', '191535952', '191535953', '191535951', '191535899', '187485544', '187484843', '191528502', '187485557', '187485547', '187485550', '187484844', '187485552', '187485548', '187485553', '187485554', '191527327', '187485558', '187485551', '187485555', '187485546', '191528501', '187485549', '187485545', '187485556', '144866642', '191545745', '191545756', '191545737', '191545736', '191545751', '191545746', '191545743', '191545738', '191545740', '191545741', '191545754', '191545752', '191545749', '191545739', '191545750', '191545747', '191545748', '191545753', '17768866', '191545744', '191545755', '144154994', '140809998', '191547480', '191547483', '191547481', '191547482'
[2019-08-23T18:55:04.0838 CEST] [debug] [pid:2164] [DBIC] Took 0.78038500 seconds: SELECT me.id, me.filename, me.t_created FROM screenshots me LEFT OUTER JOIN screenshot_links links_outer ON links_outer.screenshot_id = me.id WHERE ( id IN ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ) ) GROUP BY me.id HAVING COUNT(links_outer.job_id) = 0: '17772437', '156210750', '191526303', '99274929', '99305674', '99274930', '99274931', '99274932', '185486636', '185486638', '185486363', '185486366', '185486365', '190452973', '176245953', '130324119', '191530001', '191528171', '105083126', '105046831', '176279984', '191530001', '191530081', '191530080', '191526708', '185107267', '152607891', '191526920', '190738955', '191526709', '155173141', '185107053', '106285578', '191526710', '191526711', '152788113', '152576923', '106285578', '191530153', '56783488', '56783487', '105046650', '105046651', '52217985', '105046651', '105046658', '176251808', '191539082', '191539080', '191539081', '191539079', '187485552', '191528175', '187485548', '191528171', '187485560', '187485550', '187485547', '187485557', '187485544', '187485556', '187485549', '187485545', '187485555', '187485546', '190269450', '191528174', '187485551', '187485558', '187485553', '187485554', '191530895', '191530896', '191530900', '191530898', '191530902', '105087506', '191530899', '133347406', '133347405', '105083544', '191530895', '191530897', '152581054', '191530901', '140809663', '190740628', '144154994', '191537673', '191537672', '191537666', '191537683', '191537674', '191537679', '17768866', '191537667', '191537669', '191537676', '191537678', '191537677', '191537665', '191537668', '191537680', '191537670', '191537671', '191537681', '191537682', '191537675', '142394961', '144154994'
[2019-08-23T18:55:09.0039 CEST] [debug] [pid:2164] [DBIC] Took 3.77121400 seconds: SELECT me.id, me.filename, me.t_created FROM screenshots me LEFT OUTER JOIN screenshot_links links_outer ON links_outer.screenshot_id = me.id WHERE ( id IN ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ) ) GROUP BY me.id HAVING COUNT(links_outer.job_id) = 0: '191662048', '191662052', '191662048', '107017704', '107017708', '148213088', '191662050', '107017707', '107017705', '191662051', '191661350', '130407153', '99830536', '99830499', '156209408', '99830535', '99830500', '191526292', '190817757', '191526986', '191659519', '191659517', '191659516', '191659518', '191659515', '180993512', '190738444', '190738927', '190738926', '190738925', '190818308', '140809775', '191661243', '106403140', '191663586', '191663596', '191663589', '191663587', '191663582', '191663590', '191663594', '191663591', '190738970', '190738969', '106366457', '187486088', '106302664', '106302530', '187485773', '106302600', '191661459', '191661350', '191661459', '191661460', '191661458', '140809842', '191664193', '191664190', '191664195', '191664192', '191664194', '30402654', '191664191', '191663583', '191663585', '191663595', '191663593', '191663580', '191663588', '191663598', '191663581', '191663584', '191663592', '191663599', '191663597', '52217985', '191541980', '191668332', '191668335', '191668331', '191668334', '180993512', '94026418', '191668333', '191541789', '17780336', '191541980', '191664094', '140809998', '140809842', '53340426', '76809710', '140810010', '30402654', '112145167', '191665304', '30402312', '191665302', '191665307', '191665303', '191665301', '191665306', '52436568', '191665305', '110620931', '110603864', '110500248', '191541980', '110450278', '30402312'
[2019-08-23T18:55:10.0322 CEST] [debug] [pid:2164] [DBIC] Took 0.85597900 seconds: SELECT me.id, me.filename, me.t_created FROM screenshots me LEFT OUTER JOIN screenshot_links links_outer ON links_outer.screenshot_id = me.id WHERE ( id IN ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ) ) GROUP BY me.id HAVING COUNT(links_outer.job_id) = 0: '106285578', '185107053', '176022236', '187483653', '130220341', '191527066', '56783604', '56783488', '105046650', '52217986', '185486363', '185486365', '185486366', '185958437', '99830536', '156209408', '99830535', '99830500', '191526292', '99830499', '17772437', '191526709', '191526708', '190738955', '155173141', '191526711', '191526710', '185107053', '185486636', '185486638', '191527012', '105046732', '105046500', '105046735', '130220341', '130324204', '105046499', '105046733', '105046734', '176279984', '105047637', '105046831', '105046658', '191527066', '191527067', '191527065', '176251808', '105046650', '186311177', '191527623', '140809663', '191527388', '191527386', '105087464', '191527143', '151514095', '166286790', '148183313', '105083401', '133347265', '191527387', '191527143', '144155069', '152584605', '191531935', '191531945', '191531943', '191531939', '191531944', '191531938', '191531947', '191531949', '191531932', '17768866', '191531942', '191531933', '191531940', '191531948', '191531951', '191531937', '191531946', '191531936', '191531941', '191531934', '191531950', '144155069', '144154994', '191533188'

Notice it takes between 600 and 3700ms to do this query. We already have an index on screenshot_id, so we may require something smarter.

History

#1 Updated by coolo 6 months ago

We currently maintain 37.3M links to 20.2M screenshots.

#2 Updated by coolo 6 months ago

Idea: Do not outer join but check which screenshot is no longer in the links table:

openqa=> SELECT screenshot_id FROM screenshot_links WHERE ( screenshot_id IN ( 152582612,189955214 ) ) GROUP BY screenshot_id;
 screenshot_id 
---------------
     152582612
(1 row)

Filter the list of screenshots and then do the delete. This is 2 statements instead of one, but should be faster.

#3 Updated by coolo 6 months ago

An alternative could be to delay the cleanup of the screenshots to a special job that does the HAVING clause through the full table. Would be super expensive, but that way the deletion of a job would be one single DELETE with the database removing references.

#4 Updated by coolo 6 months ago

My first idea fails basically at the same problem

[debug] [pid:27605] [DBIC] Took 3.89527600 seconds: SELECT me.screenshot_id FROM screenshot_links me WHERE ( screenshot_id IN ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ) ) GROUP BY screenshot_id: '287181065', '215459685', '302832007', '287181070', '299257406', '302861233', '302862087', '288997894', '299257166', '287181072', '215462381', '31553732', '280971844', '287181067', '302861232', '299257163', '302861231', '215463954', '159096994', '302795324', '302853725', '226866421', '302795453', '226856351', '299254270', '299257167', '32749642', '302832446', '159097198', '291513596', '302858363', '226892376', '291589557', '299257150', '117089690', '299257164', '302855771', '299257403', '299254269', '302853732', '215463949', '299257175', '197530853', '302851980', '302858364', '299257547', '302853726', '299257162', '302851981', '299253973', '302851978', '226855826', '75288381', '73890706', '302862085', '302862083', '302851975', '31553226', '294003736', '302858357', '302858361', '51462118', '299257176', '287181071', '34757947', '226855827', '17773949', '17846040', '280069286', '299257169', '30402698', '299257174', '299257160', '302853728', '215459684', '281233251', '299257173', '302861229', '215463948', '299257404', '302858362', '159096836', '299257179', '302862084', '271867208', '302851982', '299257165', '299257409', '302828882', '30410049', '302851977', '75288379', '301682919', '299257168', '17933730', '32102128', '271867043', '302853735', '302858360', '31598865', '291513598', '302840373', '291513599', '302851976', '287181069', '302858359', '302861234', '302862086', '301671365', '291513597', '302861227', '302851983', '302853730', '299257407', '74619171', '282497333', '299257161', '299257171', '299257405', '287181064', '297502160', '302853729', '302858355', '302851974', '299257172', '287181068', '27379318', '34757946', '302853723', '302858354', '299257159', '302853727', '302858365', '299257178', '299257177', '302861230', '299253562', '302851979', '159096837', '299257408', '31492414', '302795312', '302853721', '299257170', '302858358', '302851973', '302795452', '302861228', '302853722', '299257158', '215459773'

#5 Updated by coolo 6 months ago

Sounds to me like we need a link_count column in the screenshots so we can easily query all screenshots with 0 links - usual locking problems apply obviously.

#6 Updated by mkittler 6 months ago

  • Assignee set to mkittler

#7 Updated by mkittler 6 months ago

  • Status changed from New to In Progress

I played a little bit around with it but couldn't find any simpler solution then actually adding a ink_count column like you suggested. So I'm doing that now.

By the way, even locally (with a job picked from the OSD database which has 437 screenshot links) this query takes 11 seconds.

#9 Updated by mkittler 6 months ago

  • Target version changed from Ready to Current Sprint

#11 Updated by mkittler 6 months ago

  • Status changed from In Progress to Resolved

deployed on OSD

#12 Updated by mkittler 6 months ago

  • Target version changed from Current Sprint to Done

Also available in: Atom PDF