action #110136
closed
coordination #109846: [epic] Ensure all our database tables accomodate enough data, e.g. bigint for ids
[research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices size:S
Added by okurz about 3 years ago.
Updated almost 3 years ago.
Category:
Feature requests
Description
Motivation¶
See parent #109846
Acceptance criteria¶
-
AC1: We know which best practice we will follow regarding database IDs nearing a limit
Suggestions¶
- Conduct a web research for "monitor database table IDs nearing a limit" or "just use bigint everywhere" :)
- Regarding the use of bigint research in particular how much additional space and performance cost bigint would really mean, maybe negligible
- Record the findings in this ticket
- Ensure everybody knows the verdict, e.g. just mention the ticket with resolution in the weekly or chat
- Subject changed from [research][timeboxed:6h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL admins best practices to [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices
- Description updated (diff)
- Subject changed from [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices to [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices size:S
- Status changed from New to Workable
- Status changed from Workable to In Progress
This is a pretty interesting StackOverflow post, explaining how to optimise PostgreSQL storage space with "Column Tetris". TLDR: Using int (4 bytes) instead of bigint (8 bytes) is usually not even worth it, due to alignment padding.
And here is another StackOverflow post exploring the performance differences of int vs bigint. TLDR: No real differences.
It is pretty easy to check the latest value of a PostgreSQL sequence manually:
$ psql openqa -c 'select last_value from job_modules_id_seq;'
last_value
------------
451728054
(1 row)
So turning this into a script that checks the value against the maximum for a 4 byte int (2147483647) for all openQA tables wouldn't be too hard.
Here's a query to get the current values for all sequences:
$ psql openqa -c 'select schemaname, sequencename, last_value, max_value from pg_sequences where not cycle;'
schemaname | sequencename | last_value | max_value
------------+----------------------------------------------+------------+---------------------
public | api_keys_id_seq | 217 | 9223372036854775807
public | assets_id_seq | 66639387 | 9223372036854775807
public | audit_events_id_seq | 26720907 | 9223372036854775807
public | bugs_id_seq | 3589 | 9223372036854775807
public | comments_id_seq | 275353 | 9223372036854775807
public | job_group_parents_id_seq | 7 | 9223372036854775807
public | dbix_class_deploymenthandler_versions_id_seq | 111 | 9223372036854775807
public | gru_tasks_id_seq | 18218176 | 9223372036854775807
public | job_groups_id_seq | 99 | 9223372036854775807
public | job_modules_id_seq | 451728848 | 9223372036854775807
public | job_settings_id_seq | 63508961 | 9223372036854775807
public | job_template_settings_id_seq | 6763 | 9223372036854775807
public | job_templates_id_seq | 8110 | 9223372036854775807
public | jobs_id_seq | 2404719 | 9223372036854775807
public | machine_settings_id_seq | 928 | 9223372036854775807
public | minion_jobs_id_seq | 1265856 | 9223372036854775807
public | machines_id_seq | 1082 | 9223372036854775807
public | minion_locks_id_seq | 1525052 | 9223372036854775807
public | minion_workers_id_seq | 1004 | 9223372036854775807
public | needle_dirs_id_seq | 20 | 9223372036854775807
public | needles_id_seq | 24707 | 9223372036854775807
public | product_settings_id_seq | 1225 | 9223372036854775807
public | products_id_seq | 435 | 9223372036854775807
public | secrets_id_seq | 1 | 9223372036854775807
public | scheduled_products_id_seq | 259488 | 9223372036854775807
public | test_suites_id_seq | 1686 | 9223372036854775807
public | screenshots_id_seq | 1083270780 | 9223372036854775807
public | test_suite_settings_id_seq | 4727 | 9223372036854775807
public | users_id_seq | 856 | 9223372036854775807
public | worker_properties_id_seq | 815021 | 9223372036854775807
public | workers_id_seq | 450 | 9223372036854775807
(31 rows)
It's interesting that even the sequences for our 4 byte int primary keys are using an 8 byte max value. The columns themselves are definitely 4 bytes:
openqa=# \d api_keys
Table "public.api_keys"
Column | Type | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('api_keys_id_seq'::regclass)
key | text | | not null |
...
- Due date set to 2022-06-22
Setting due date based on mean cycle time of SUSE QE Tools
- Status changed from In Progress to Resolved
Think i have enough information collected now. My recommendation is to just migrate all our serial
columns to bigserial
.
- Due date deleted (
2022-06-22)
Also available in: Atom
PDF