action #110136
closedcoordination #109846: [epic] Ensure all our database tables accomodate enough data, e.g. bigint for ids
[research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices size:S
0%
Description
Motivation¶
See parent #109846
Acceptance criteria¶
- AC1: We know which best practice we will follow regarding database IDs nearing a limit
Suggestions¶
- Conduct a web research for "monitor database table IDs nearing a limit" or "just use bigint everywhere" :)
- Regarding the use of bigint research in particular how much additional space and performance cost bigint would really mean, maybe negligible
- Record the findings in this ticket
- Ensure everybody knows the verdict, e.g. just mention the ticket with resolution in the weekly or chat
Updated by okurz over 2 years ago
- Subject changed from [research][timeboxed:6h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL admins best practices to [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices
- Description updated (diff)
Updated by mkittler over 2 years ago
- Subject changed from [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices to [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices size:S
Updated by kraih over 2 years ago
This is a pretty interesting StackOverflow post, explaining how to optimise PostgreSQL storage space with "Column Tetris". TLDR: Using int (4 bytes) instead of bigint (8 bytes) is usually not even worth it, due to alignment padding.
Updated by kraih over 2 years ago
And here is another StackOverflow post exploring the performance differences of int vs bigint. TLDR: No real differences.
Updated by kraih over 2 years ago
It is pretty easy to check the latest value of a PostgreSQL sequence manually:
$ psql openqa -c 'select last_value from job_modules_id_seq;'
last_value
------------
451728054
(1 row)
So turning this into a script that checks the value against the maximum for a 4 byte int (2147483647) for all openQA tables wouldn't be too hard.
Updated by kraih over 2 years ago
Percona has an interesting looking complete database monitoring solution called PMM based around Prometheus and Grafana.
Updated by kraih over 2 years ago
Here's a query to get the current values for all sequences:
$ psql openqa -c 'select schemaname, sequencename, last_value, max_value from pg_sequences where not cycle;'
schemaname | sequencename | last_value | max_value
------------+----------------------------------------------+------------+---------------------
public | api_keys_id_seq | 217 | 9223372036854775807
public | assets_id_seq | 66639387 | 9223372036854775807
public | audit_events_id_seq | 26720907 | 9223372036854775807
public | bugs_id_seq | 3589 | 9223372036854775807
public | comments_id_seq | 275353 | 9223372036854775807
public | job_group_parents_id_seq | 7 | 9223372036854775807
public | dbix_class_deploymenthandler_versions_id_seq | 111 | 9223372036854775807
public | gru_tasks_id_seq | 18218176 | 9223372036854775807
public | job_groups_id_seq | 99 | 9223372036854775807
public | job_modules_id_seq | 451728848 | 9223372036854775807
public | job_settings_id_seq | 63508961 | 9223372036854775807
public | job_template_settings_id_seq | 6763 | 9223372036854775807
public | job_templates_id_seq | 8110 | 9223372036854775807
public | jobs_id_seq | 2404719 | 9223372036854775807
public | machine_settings_id_seq | 928 | 9223372036854775807
public | minion_jobs_id_seq | 1265856 | 9223372036854775807
public | machines_id_seq | 1082 | 9223372036854775807
public | minion_locks_id_seq | 1525052 | 9223372036854775807
public | minion_workers_id_seq | 1004 | 9223372036854775807
public | needle_dirs_id_seq | 20 | 9223372036854775807
public | needles_id_seq | 24707 | 9223372036854775807
public | product_settings_id_seq | 1225 | 9223372036854775807
public | products_id_seq | 435 | 9223372036854775807
public | secrets_id_seq | 1 | 9223372036854775807
public | scheduled_products_id_seq | 259488 | 9223372036854775807
public | test_suites_id_seq | 1686 | 9223372036854775807
public | screenshots_id_seq | 1083270780 | 9223372036854775807
public | test_suite_settings_id_seq | 4727 | 9223372036854775807
public | users_id_seq | 856 | 9223372036854775807
public | worker_properties_id_seq | 815021 | 9223372036854775807
public | workers_id_seq | 450 | 9223372036854775807
(31 rows)
It's interesting that even the sequences for our 4 byte int primary keys are using an 8 byte max value. The columns themselves are definitely 4 bytes:
openqa=# \d api_keys
Table "public.api_keys"
Column | Type | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('api_keys_id_seq'::regclass)
key | text | | not null |
...
Updated by openqa_review over 2 years ago
- Due date set to 2022-06-22
Setting due date based on mean cycle time of SUSE QE Tools
Updated by kraih over 2 years ago
- Status changed from In Progress to Resolved
Think i have enough information collected now. My recommendation is to just migrate all our serial
columns to bigserial
.