tickets #93621
closedmirrordb - how can table 'filearr' have duplicate rows?
100%
Description
I was working on #92930 which kept complaining about duplicate rows in 'filearr'.
Given that filearr has a unique index on 'path', it seems this ought to be impossible.
I deleted some 5000+ duplicate rows, but they keep coming back. It is screwing up the mirror scanning, see e.g. #93477 and #93255.
Yesterday whilst working on #93477, I deleted 1686 duplicates (from 'filearr' and 'hash').
Obviously something was changed somewhere, but what?
Updated by pjessen almost 3 years ago
I was just looking at the logs on mirrordb1, '/var/lib/pgsql/data13/log/postgresql*zst', and I see the first 'duplicate key' error appearing on 2021-05-20 at 21:19:13.
Updated by pjessen almost 3 years ago
pjessen wrote:
I was just looking at the logs on mirrordb1, '/var/lib/pgsql/data13/log/postgresql*zst', and I see the first 'duplicate key' error appearing on 2021-05-20 at 21:19:13.
Judging by the zypper logs, mirrordb1 was dup'ed to Leap 15.3 on 2021-05-20 at 21:08:49. I think that answers my question above 'Obviously something was changed somewhere, but what?'.
Updated by pjessen almost 3 years ago
- Category set to Core services and virtual infrastructure
Updated by pjessen almost 3 years ago
after we switched to running on mirrordb2, I took a look at the logs - since this morning at 01:46 UTC, I see 46 cases of
ERROR: duplicate key value violates unique constraint "filearr_path_key"
Three cases of:
ERROR: duplicate key value violates unique constraint "device_uniqueness"
I have tried to run this a couple of times, but it keeps timing out:
select path,array_agg(id) from filearr group by path having count(*)>1;
Updated by pjessen almost 3 years ago
2021-06-15 05:36:38.358 UTC [6796]: [4-1] db=mb_opensuse2,user=mb ERROR: duplicate key value violates unique constraint "filearr_path_key"
2021-06-15 05:36:38.358 UTC [6796]: [5-1] db=mb_opensuse2,user=mb DETAIL: Key (path)=(tumbleweed/repo/oss/x86_64/libmlt++-devel-6.26.1-1.1.x86_64.rpm) already exists.
2021-06-15 05:36:38.358 UTC [6796]: [6-1] db=mb_opensuse2,user=mb CONTEXT: SQL statement "UPDATE filearr
SET mirrors = arr WHERE id = arg_fileid"
PL/pgSQL function mirr_del_byid(integer,integer) line 21 at SQL statement
2021-06-15 05:36:38.358 UTC [6796]: [7-1] db=mb_opensuse2,user=mb STATEMENT: SELECT COUNT(mirr_del_byid(556, id) order by id) FROM temp1
Updated by pjessen almost 3 years ago
I am sort of taking notes as I go - I am not very familiar with postgresql, so I might well forget something.
On both db1 and db2, table 'filearr' has some indexes names 'something_ccnew':
mb_opensuse2=# \d filearr
Table "public.filearr"
Column | Type | Collation | Nullable | Default
----------+------------------------+-----------+----------+-------------------------------------
id | integer | | not null | nextval('filearr_id_seq'::regclass)
path | character varying(512) | | not null |
mirrors | smallint[] | | |
dirname | character varying(512) | | |
filename | character varying(512) | | |
Indexes:
"filearr_pkey" PRIMARY KEY, btree (id)
"filearr_path_key" UNIQUE CONSTRAINT, btree (path)
"filearr_path_key_ccnew" UNIQUE, btree (path) INVALID
"filearr_pkey_ccnew" UNIQUE, btree (id) INVALID
"filearr_dirname_btree" btree (dirname)
"filearr_dirname_btree_ccnew" btree (dirname)
According to the manual, https://www.postgresql.org/docs/13/sql-reindex.html :
If the index marked INVALID is suffixed ccnew, then it corresponds to the transient index created during the concurrent operation, and the recommended recovery method is to drop it using DROP INDEX, then attempt REINDEX CONCURRENTLY again.
Updated by pjessen almost 3 years ago
pjessen wrote:
Brief update - so far no new duplicates!
approx 24 hours later, still no new dupes!
Updated by pjessen almost 3 years ago
- Assignee set to pjessen
- % Done changed from 0 to 100
Still no new dupes. I'm closing this, the downgrade to pg12 has clearly solved the issue.