Project

General

Profile

Actions

tickets #93621

closed

mirrordb - how can table 'filearr' have duplicate rows?

Added by pjessen almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Core services and virtual infrastructure
Target version:
-
Start date:
2021-06-08
Due date:
% Done:

100%

Estimated time:

Description

I was working on #92930 which kept complaining about duplicate rows in 'filearr'.

Given that filearr has a unique index on 'path', it seems this ought to be impossible.
I deleted some 5000+ duplicate rows, but they keep coming back. It is screwing up the mirror scanning, see e.g. #93477 and #93255.
Yesterday whilst working on #93477, I deleted 1686 duplicates (from 'filearr' and 'hash').
Obviously something was changed somewhere, but what?

Actions #1

Updated by pjessen almost 3 years ago

  • Private changed from Yes to No
Actions #2

Updated by pjessen almost 3 years ago

I was just looking at the logs on mirrordb1, '/var/lib/pgsql/data13/log/postgresql*zst', and I see the first 'duplicate key' error appearing on 2021-05-20 at 21:19:13.

Actions #3

Updated by pjessen almost 3 years ago

pjessen wrote:

I was just looking at the logs on mirrordb1, '/var/lib/pgsql/data13/log/postgresql*zst', and I see the first 'duplicate key' error appearing on 2021-05-20 at 21:19:13.

Judging by the zypper logs, mirrordb1 was dup'ed to Leap 15.3 on 2021-05-20 at 21:08:49. I think that answers my question above 'Obviously something was changed somewhere, but what?'.

Actions #4

Updated by pjessen almost 3 years ago

  • Category set to Core services and virtual infrastructure
Actions #5

Updated by pjessen almost 3 years ago

after we switched to running on mirrordb2, I took a look at the logs - since this morning at 01:46 UTC, I see 46 cases of

ERROR:  duplicate key value violates unique constraint "filearr_path_key"

Three cases of:

ERROR:  duplicate key value violates unique constraint "device_uniqueness"

I have tried to run this a couple of times, but it keeps timing out:

select path,array_agg(id) from filearr group by path having count(*)>1;
Actions #6

Updated by pjessen almost 3 years ago

2021-06-15 05:36:38.358 UTC [6796]: [4-1] db=mb_opensuse2,user=mb ERROR:  duplicate key value violates unique constraint "filearr_path_key"
2021-06-15 05:36:38.358 UTC [6796]: [5-1] db=mb_opensuse2,user=mb DETAIL:  Key (path)=(tumbleweed/repo/oss/x86_64/libmlt++-devel-6.26.1-1.1.x86_64.rpm) already exists.
2021-06-15 05:36:38.358 UTC [6796]: [6-1] db=mb_opensuse2,user=mb CONTEXT:  SQL statement "UPDATE filearr 
                    SET mirrors = arr WHERE id = arg_fileid"
        PL/pgSQL function mirr_del_byid(integer,integer) line 21 at SQL statement
2021-06-15 05:36:38.358 UTC [6796]: [7-1] db=mb_opensuse2,user=mb STATEMENT:  SELECT COUNT(mirr_del_byid(556, id) order by id) FROM temp1
Actions #7

Updated by pjessen almost 3 years ago

I am sort of taking notes as I go - I am not very familiar with postgresql, so I might well forget something.

On both db1 and db2, table 'filearr' has some indexes names 'something_ccnew':

mb_opensuse2=# \d filearr
                                     Table "public.filearr"
  Column  |          Type          | Collation | Nullable |               Default               
----------+------------------------+-----------+----------+-------------------------------------
 id       | integer                |           | not null | nextval('filearr_id_seq'::regclass)
 path     | character varying(512) |           | not null | 
 mirrors  | smallint[]             |           |          | 
 dirname  | character varying(512) |           |          | 
 filename | character varying(512) |           |          | 
Indexes:
    "filearr_pkey" PRIMARY KEY, btree (id)
    "filearr_path_key" UNIQUE CONSTRAINT, btree (path)
    "filearr_path_key_ccnew" UNIQUE, btree (path) INVALID
    "filearr_pkey_ccnew" UNIQUE, btree (id) INVALID
    "filearr_dirname_btree" btree (dirname)
    "filearr_dirname_btree_ccnew" btree (dirname)

According to the manual, https://www.postgresql.org/docs/13/sql-reindex.html :

If the index marked INVALID is suffixed ccnew, then it corresponds to the transient index created during the concurrent operation, and the recommended recovery method is to drop it using DROP INDEX, then attempt REINDEX CONCURRENTLY again.

Actions #8

Updated by pjessen almost 3 years ago

Brief update - so far no new duplicates!

Actions #9

Updated by pjessen almost 3 years ago

pjessen wrote:

Brief update - so far no new duplicates!

approx 24 hours later, still no new dupes!

Actions #10

Updated by pjessen almost 3 years ago

  • Assignee set to pjessen
  • % Done changed from 0 to 100

Still no new dupes. I'm closing this, the downgrade to pg12 has clearly solved the issue.

Actions #11

Updated by pjessen almost 3 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF