on it. https://openqa.suse.de/tests/8516601 shows that the issue already happened before today's deployment.
openqa=> select * from job_modules order by id desc limit 10;
id | job_id | name | script | category | t_created | t_updated | result | milestone | important | fatal | always_rollback
------------+---------+--------------------+-------------------------+----------+----------------------------+----------------------------+--------+-----------+-----------+-------+-----------------
2147482702 | 8508180 | fcntl09 | tests/kernel/run_ltp.pm | kernel | 2022-04-11 23:03:39.362748 | 2022-04-11 23:03:39.362748 | none | 0 | 1 | 0 | 0
shows that the most recent job module insert was 2147482702 from 022-04-11 23:03:39.362748. The number is suspiciously near 2 ** 31 = 2147483648
We currently have
openqa=> select count(*) from job_modules;
count
----------
27364042
(1 row)
so I guess an in-place migration of 27M modules would take some minutes to hours but not endless. For now only new jobs since about 10h are affected, so we should be careful to not destroy everything.
I stopped the scheduler for now to not add more tests for now which can not complete anyway.
EDIT: Executing ALTER TABLE job_modules ALTER COLUMN id TYPE bigint;
manually. Took some minutes (only).
Also triggered
host=openqa.suse.de failed_since=2022-04-11 ./openqa-advanced-retrigger-jobs
to restart the 3.5k incompletes, see https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?viewPanel=14&orgId=1 .
openqa=> \d job_modules;
Table "public.job_modules"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------------+-----------+----------+-----------------------------------------
id | bigint | | not null | nextval('job_modules_id_seq'::regclass)
job_id | integer | | not null |
name | text | | not null |
script | text | | not null |
category | text | | not null |
t_created | timestamp without time zone | | not null |
t_updated | timestamp without time zone | | not null |
result | character varying | | not null | 'none'::character varying
milestone | integer | | not null | 0
important | integer | | not null | 0
fatal | integer | | not null | 0
always_rollback | integer | | not null | 0
Indexes:
"job_modules_pkey" PRIMARY KEY, btree (id)
"idx_job_modules_result" btree (result)
"job_modules_idx_job_id" btree (job_id)
"job_modules_job_id_name_category_script" UNIQUE CONSTRAINT, btree (job_id, name, category, script)
Foreign-key constraints:
"job_modules_fk_job_id" FOREIGN KEY (job_id) REFERENCES jobs(id) ON UPDATE CASCADE DEFERRABLE
Referenced by:
TABLE "needles" CONSTRAINT "needles_fk_last_matched_module_id" FOREIGN KEY (last_matched_module_id) REFERENCES job_modules(id) ON DELETE SET NULL DEFERRABLE
TABLE "needles" CONSTRAINT "needles_fk_last_seen_module_id" FOREIGN KEY (last_seen_module_id) REFERENCES job_modules(id) ON DELETE SET NULL DEFERRABLE
looks better now.