Project

General

Profile

Actions

action #112265

closed

coordination #109846: [epic] Ensure all our database tables accomodate enough data, e.g. bigint for ids

Just use bigint in all our database tables (for auto-incremented ID-columns) size:M

Added by okurz over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-06-10
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See #110136#note-12

Acceptance criteria

  • AC1: All our database tables in openQA use bigint (or are reviewed if it makes sense)

Suggestions

  • Prepare pull requests for individual tables and merge one by one and monitor production
  • Fix production fallouts
  • Conduct lesson learned meetings for everything that has gone wrong :)

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - coordination #112718: [alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over the weekendResolvedokurz2022-06-22

Actions
Actions #1

Updated by mkittler over 2 years ago

  • Assignee set to mkittler
Actions #2

Updated by kraih over 2 years ago

Btw. i don't think you have to care about the sequences at all, they seem to already be 64 bit. So just the int fields themselves need to be upgraded to bigint.

Actions #3

Updated by mkittler over 2 years ago

  • Subject changed from Just use bigint in all our database tables to Just use bigint in all our database tables (for auto-incremented ID-columns)

@kraih I suppose. For the one table we've already converted there was no extra effort required in that regard.

Actions #4

Updated by mkittler over 2 years ago

I've just checked myself (out of curiosity) and openqa=> SELECT * FROM information_schema.sequences ORDER BY sequence_name; shows indeed that all sequences are bigint with a max. value of 9223372036854775807. That's at least true for OSD and o3. However, my local database has actually still integer sequences - also after running the migration that changes the corresponding IDs:

openqa-local-bigint=> SELECT * FROM information_schema.sequences ORDER BY data_type LIMIT 6;
  sequence_catalog   | sequence_schema |     sequence_name     | data_type | numeric_precision | numeric_precision_radix | numeric_scale | start_value | minimum_value |    maximum_value    | increment | cycle_option 
---------------------+-----------------+-----------------------+-----------+-------------------+-------------------------+---------------+-------------+---------------+---------------------+-----------+--------------
 openqa-local-bigint | public          | minion_workers_id_seq | bigint    |                64 |                       2 |             0 | 1           | 1             | 9223372036854775807 | 1         | NO
 openqa-local-bigint | public          | minion_locks_id_seq   | bigint    |                64 |                       2 |             0 | 1           | 1             | 9223372036854775807 | 1         | NO
 openqa-local-bigint | public          | minion_jobs_id_seq    | bigint    |                64 |                       2 |             0 | 1           | 1             | 9223372036854775807 | 1         | NO
 openqa-local-bigint | public          | assets_id_seq         | integer   |                32 |                       2 |             0 | 1           | 1             | 2147483647          | 1         | NO
 openqa-local-bigint | public          | audit_events_id_seq   | integer   |                32 |                       2 |             0 | 1           | 1             | 2147483647          | 1         | NO
 openqa-local-bigint | public          | gru_tasks_id_seq      | integer   |                32 |                       2 |             0 | 1           | 1             | 2147483647          | 1         | NO

Only the sequences from Minion are bigint here. I suppose to cover "legacy"¹ setups the migration should handle sequences after all.

¹Not sure what makes my setup special.

Actions #5

Updated by mkittler over 2 years ago

  • Status changed from New to In Progress
Actions #6

Updated by mkittler over 2 years ago

  • Status changed from In Progress to Feedback
Actions #7

Updated by kraih over 2 years ago

mkittler wrote:

However, my local database has actually still integer sequences - also after running the migration that changes the corresponding IDs:

Same here actually. My local O3 clone has 8 byte sequences, while the empty dev database mostly still has 4 byte sequences.

Actions #8

Updated by okurz over 2 years ago

https://github.com/os-autoinst/openQA/pull/4701 merged. The deployment on o3 was triggered soon after and completed after a mere 5 minutes, see https://github.com/os-autoinst/openQA/pull/4701#issuecomment-1159404851, triggering a deployment on OSD explicitly so that I can monitor while I am available: https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/416342. Otherwise the next deployment would be triggered tomorrow morning, likely also fine.

https://openqa.opensuse.org/tests/2421090#step/start_test/7 just hit the short time window where o3 wasn't available. I labeled and retriggered the job so that the OSD deployment will find all green openQA-in-openQA tests.

On OSD sudo journalctl --since="2022-06-18 11:39" -u openqa-webui shows

Jun 18 12:39:27 openqa openqa[4691]: [warn] [pid:4691] Job IDs will be converted to bigint. That may take multiple hours on big databases.  It is safe to stop the service (and start from >
Jun 18 12:57:31 openqa openqa[4691]: [info] Loading external plugin AMQP

so 18m for migration. Some alerts about minion jobs have been triggered but shortly went back to ok again. I guess such alerts are ok for such exceptional migrations.

Actions #9

Updated by okurz over 2 years ago

  • Related to coordination #112718: [alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over the weekend added
Actions #11

Updated by okurz over 2 years ago

  • Due date set to 2022-07-15
Actions #12

Updated by okurz over 2 years ago

  • Subject changed from Just use bigint in all our database tables (for auto-incremented ID-columns) to Just use bigint in all our database tables (for auto-incremented ID-columns) size:M
  • Description updated (diff)
Actions #13

Updated by okurz over 2 years ago

merged yesterday. o3 was auto-deployed and migrated and it looked fine. Today OSD was auto-deployed and also looks fine. https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1657431899773&to=1657433289012&viewPanel=78 shows the migration for http response time. Database rows in https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1657429493800&to=1657436565913&viewPanel=89 also show the immediate effect of migration but nothing significant after the migration finished. https://monitor.qa.suse.de/alerting/list?state=not_ok shows no alerts.

EDIT: Reviewed some QA SLE maintenance update tests with retriggering and such and everything works as expected

Actions #14

Updated by mkittler over 2 years ago

Yes, seems like everything works as expected. Just to be sure, I've ran VACUUM ANALYZE on all affected tables. That shouldn't harm and previously ensured the query planner uses indexes correctly.

Actions #15

Updated by mkittler over 2 years ago

  • Status changed from Feedback to Resolved

It looks still good.

Actions #16

Updated by okurz over 2 years ago

  • Due date deleted (2022-07-15)
Actions #17

Updated by AdamWill about 2 years ago

well, er, this didn't go so well on our staging instance:

Nov 26 02:27:59 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[10465]: [warn] Job IDs will be converted to bigint. That may take multiple hours on big databases.  It is safe to stop the service (and start from scratch on the next startup).
Nov 26 02:30:36 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[10465]: failed to run SQL in /usr/share/openqa/script/../dbicdh/PostgreSQL/upgrade/94-95/001-auto.sql: DBIx::Class::DeploymentHandler::DeployMethod::SQL::Translator::try {...} (): DBI Exception: DBD::Pg::db do failed: ERROR:  index row size 2712 exceeds btree version 4 maximum 2704 for index "idx_job_value_settings"
Nov 26 02:30:36 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[10465]: DETAIL:  Index row references tuple (141486,37) in relation "job_settings".
Nov 26 02:30:36 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[10465]: HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Nov 26 02:30:36 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[10465]: Consider a function index of an MD5 hash of the value, or use full text indexing. at inline delegation in DBIx::Class::DeploymentHandler for deploy_method->upgrade_single_step (attribute declared in /usr/share/perl5/vendor_perl/DBIx/Class/DeploymentHandler/WithApplicatorDumple.pm at line 51) line 18
Nov 26 02:30:36 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[10465]:  (running line 'ALTER TABLE job_settings ALTER COLUMN id TYPE bigint') at /usr/share/perl5/vendor_perl/DBIx/Class/DeploymentHandler/DeployMethod/SQL/Translator.pm line 263.
Nov 26 02:30:36 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[10465]: DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa line 0
Nov 26 02:30:36 openqa-lab01.iad2.fedoraproject.org openqa-webui-daemon[10465]: DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa line 0
Actions #18

Updated by AdamWill about 2 years ago

googling for that error suggests this may be tricky to resolve and involve, uh, knowing about databases and stuff. and it's friday night, so for now I've downgraded openQA back to a June snapshot. Will try and figure this out on Monday. help appreciated...

Actions

Also available in: Atom PDF