Project

General

Profile

coordination #109846

Updated by okurz about 2 years ago

## Motivation 

 See #109836 where we found all jobs on OSD are incomplete with the reason 'Reason: api failure: 400 response: OpenQA::Schema::Result::Jobs::insert_module(): DBI Exception: DBD::Pg::st execute failed: ERROR: integer out of range [for Statement " INSERT INTO job_modules ( job_id, name, category, script, milestone, important, fatal, always_rollback, t_created, t_u…' on 2022-04-12 morning due to hitting the integer limit of the job module id. This was mitigated for now on OSD with manual database alterations but needs to be accomodated in our table creations including migrations of existing instances. 

 ## Acceptance criteria 
 * **AC1:** New instances of openQA can handle multi-billion job module entries 
 * **AC2:** Existing instances of openQA are migrated/upgraded according to AC1 
 * **AC3:** No severe performance regression 
 * **AC4:** The migration of existing instances finishes in a reasonable time (downtime of some minutes is ok) 
 * **AC5:** Alerts 

 ## Ideas 

 What okurz did in #109836 manually effectively is in SQL: 

 ``` 
 ALTER TABLE job_modules ALTER COLUMN id TYPE bigint; 
 ALTER TABLE needles ALTER COLUMN last_seen_module_id TYPE bigint; 
 ALTER TABLE needles ALTER COLUMN last_matched_module_id TYPE bigint; 
 ``` 

 and then restart services: 

 ``` 
 systemctl restart openqa-webui openqa-scheduler 
 ``` 

 coolo: "We should do the same in the package and then update the schema version manually on osd database. After that we can check what other tables deserve the same treatment and create one more schema version" 

 * Add alerts for any IDs nearing a limit: Research the industry standard for postgreSQL admins best practices 
 * We should research how much additional space and performance cost bigint would really mean, maybe negligible 
 * Recovery took multiple hours because the team isn't currently spread across multiple timezones: This was communicated when the team composition changed and we should remind about that 

 Some references that might help: 
 * http://zemanta.github.io/2021/08/25/column-migration-from-int-to-bigint-in-postgresql/ 
 * https://stackoverflow.com/a/33509181 
 * https://www.postgresql.org/docs/9.5/sql-createsequence.html - maybe we can just cycle the counter? 
 * https://stackoverflow.com/questions/67808320/prevent-overflow-on-identity-id-column-when-performing-frequent-inserts

Back