Project

General

Profile

Actions

action #110136

closed

coordination #109846: [epic] Ensure all our database tables accomodate enough data, e.g. bigint for ids

[research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices size:S

Added by okurz over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-04-20
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See parent #109846

Acceptance criteria

  • AC1: We know which best practice we will follow regarding database IDs nearing a limit

Suggestions

  • Conduct a web research for "monitor database table IDs nearing a limit" or "just use bigint everywhere" :)
  • Regarding the use of bigint research in particular how much additional space and performance cost bigint would really mean, maybe negligible
  • Record the findings in this ticket
  • Ensure everybody knows the verdict, e.g. just mention the ticket with resolution in the weekly or chat
Actions #1

Updated by okurz over 2 years ago

  • Subject changed from [research][timeboxed:6h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL admins best practices to [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices
  • Description updated (diff)
Actions #2

Updated by mkittler over 2 years ago

  • Subject changed from [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices to [research][timeboxed:10h] Add alerts for any database IDs nearing a limit: Research the industry standard for postgreSQL software development and admins best practices size:S
Actions #3

Updated by mkittler over 2 years ago

  • Status changed from New to Workable
Actions #4

Updated by kraih over 2 years ago

  • Assignee set to kraih
Actions #5

Updated by kraih over 2 years ago

  • Status changed from Workable to In Progress
Actions #6

Updated by kraih over 2 years ago

This is a pretty interesting StackOverflow post, explaining how to optimise PostgreSQL storage space with "Column Tetris". TLDR: Using int (4 bytes) instead of bigint (8 bytes) is usually not even worth it, due to alignment padding.

Actions #7

Updated by kraih over 2 years ago

And here is another StackOverflow post exploring the performance differences of int vs bigint. TLDR: No real differences.

Actions #8

Updated by kraih over 2 years ago

It is pretty easy to check the latest value of a PostgreSQL sequence manually:

$ psql openqa -c 'select last_value from job_modules_id_seq;'
 last_value
------------
  451728054
(1 row)

So turning this into a script that checks the value against the maximum for a 4 byte int (2147483647) for all openQA tables wouldn't be too hard.

Actions #9

Updated by kraih over 2 years ago

Percona has an interesting looking complete database monitoring solution called PMM based around Prometheus and Grafana.

Actions #10

Updated by kraih over 2 years ago

Here's a query to get the current values for all sequences:

$ psql openqa -c 'select schemaname, sequencename, last_value, max_value from pg_sequences where not cycle;'
 schemaname |                 sequencename                 | last_value |      max_value
------------+----------------------------------------------+------------+---------------------
 public     | api_keys_id_seq                              |        217 | 9223372036854775807
 public     | assets_id_seq                                |   66639387 | 9223372036854775807
 public     | audit_events_id_seq                          |   26720907 | 9223372036854775807
 public     | bugs_id_seq                                  |       3589 | 9223372036854775807
 public     | comments_id_seq                              |     275353 | 9223372036854775807
 public     | job_group_parents_id_seq                     |          7 | 9223372036854775807
 public     | dbix_class_deploymenthandler_versions_id_seq |        111 | 9223372036854775807
 public     | gru_tasks_id_seq                             |   18218176 | 9223372036854775807
 public     | job_groups_id_seq                            |         99 | 9223372036854775807
 public     | job_modules_id_seq                           |  451728848 | 9223372036854775807
 public     | job_settings_id_seq                          |   63508961 | 9223372036854775807
 public     | job_template_settings_id_seq                 |       6763 | 9223372036854775807
 public     | job_templates_id_seq                         |       8110 | 9223372036854775807
 public     | jobs_id_seq                                  |    2404719 | 9223372036854775807
 public     | machine_settings_id_seq                      |        928 | 9223372036854775807
 public     | minion_jobs_id_seq                           |    1265856 | 9223372036854775807
 public     | machines_id_seq                              |       1082 | 9223372036854775807
 public     | minion_locks_id_seq                          |    1525052 | 9223372036854775807
 public     | minion_workers_id_seq                        |       1004 | 9223372036854775807
 public     | needle_dirs_id_seq                           |         20 | 9223372036854775807
 public     | needles_id_seq                               |      24707 | 9223372036854775807
 public     | product_settings_id_seq                      |       1225 | 9223372036854775807
 public     | products_id_seq                              |        435 | 9223372036854775807
 public     | secrets_id_seq                               |          1 | 9223372036854775807
 public     | scheduled_products_id_seq                    |     259488 | 9223372036854775807
 public     | test_suites_id_seq                           |       1686 | 9223372036854775807
 public     | screenshots_id_seq                           | 1083270780 | 9223372036854775807
 public     | test_suite_settings_id_seq                   |       4727 | 9223372036854775807
 public     | users_id_seq                                 |        856 | 9223372036854775807
 public     | worker_properties_id_seq                     |     815021 | 9223372036854775807
 public     | workers_id_seq                               |        450 | 9223372036854775807
(31 rows)

It's interesting that even the sequences for our 4 byte int primary keys are using an 8 byte max value. The columns themselves are definitely 4 bytes:

openqa=# \d api_keys
                                         Table "public.api_keys"
    Column    |            Type             | Collation | Nullable |               Default
--------------+-----------------------------+-----------+----------+--------------------------------------
 id           | integer                     |           | not null | nextval('api_keys_id_seq'::regclass)
 key          | text                        |           | not null |
...
Actions #11

Updated by openqa_review over 2 years ago

  • Due date set to 2022-06-22

Setting due date based on mean cycle time of SUSE QE Tools

Actions #12

Updated by kraih over 2 years ago

  • Status changed from In Progress to Resolved

Think i have enough information collected now. My recommendation is to just migrate all our serial columns to bigserial.

Actions #13

Updated by okurz about 2 years ago

  • Due date deleted (2022-06-22)
Actions

Also available in: Atom PDF