tickets #93621: mirrordb - how can table 'filearr' have duplicate rows? - openSUSE admin - openSUSE Project Management Tool

Custom queries

Events of the openSUSE Heroes
my assigned stuff
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QE tools team - backlog (ready issues)
QE tools team - backlog (w/o infra)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (w/o infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

tickets #93621

closed

mirrordb - how can table 'filearr' have duplicate rows?

Added by pjessen almost 3 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

pjessen

Category:

Core services and virtual infrastructure

Target version:

Start date:

2021-06-08

Due date:

% Done:

100%

Estimated time:

Description

I was working on #92930 which kept complaining about duplicate rows in 'filearr'.

Given that filearr has a unique index on 'path', it seems this ought to be impossible.
I deleted some 5000+ duplicate rows, but they keep coming back. It is screwing up the mirror scanning, see e.g. #93477 and #93255.
Yesterday whilst working on #93477, I deleted 1686 duplicates (from 'filearr' and 'hash').
Obviously something was changed somewhere, but what?

History
Notes
Property changes

Actions

Copy link

Updated by pjessen almost 3 years ago

Private changed from Yes to No

Actions

Copy link

Updated by pjessen almost 3 years ago

I was just looking at the logs on mirrordb1, '/var/lib/pgsql/data13/log/postgresql*zst', and I see the first 'duplicate key' error appearing on 2021-05-20 at 21:19:13.

Actions

Copy link

Updated by pjessen almost 3 years ago

pjessen wrote:

I was just looking at the logs on mirrordb1, '/var/lib/pgsql/data13/log/postgresql*zst', and I see the first 'duplicate key' error appearing on 2021-05-20 at 21:19:13.

Judging by the zypper logs, mirrordb1 was dup'ed to Leap 15.3 on 2021-05-20 at 21:08:49. I think that answers my question above 'Obviously something was changed somewhere, but what?'.

Actions

Copy link

Updated by pjessen almost 3 years ago

Category set to Core services and virtual infrastructure

Actions

Copy link

Updated by pjessen almost 3 years ago

after we switched to running on mirrordb2, I took a look at the logs - since this morning at 01:46 UTC, I see 46 cases of

ERROR:  duplicate key value violates unique constraint "filearr_path_key"

Three cases of:

ERROR:  duplicate key value violates unique constraint "device_uniqueness"

I have tried to run this a couple of times, but it keeps timing out:

select path,array_agg(id) from filearr group by path having count(*)>1;

Actions

Copy link

Updated by pjessen almost 3 years ago

2021-06-15 05:36:38.358 UTC [6796]: [4-1] db=mb_opensuse2,user=mb ERROR:  duplicate key value violates unique constraint "filearr_path_key"
2021-06-15 05:36:38.358 UTC [6796]: [5-1] db=mb_opensuse2,user=mb DETAIL:  Key (path)=(tumbleweed/repo/oss/x86_64/libmlt++-devel-6.26.1-1.1.x86_64.rpm) already exists.
2021-06-15 05:36:38.358 UTC [6796]: [6-1] db=mb_opensuse2,user=mb CONTEXT:  SQL statement "UPDATE filearr 
                    SET mirrors = arr WHERE id = arg_fileid"
        PL/pgSQL function mirr_del_byid(integer,integer) line 21 at SQL statement
2021-06-15 05:36:38.358 UTC [6796]: [7-1] db=mb_opensuse2,user=mb STATEMENT:  SELECT COUNT(mirr_del_byid(556, id) order by id) FROM temp1

Actions

Copy link

Updated by pjessen almost 3 years ago

I am sort of taking notes as I go - I am not very familiar with postgresql, so I might well forget something.

On both db1 and db2, table 'filearr' has some indexes names 'something_ccnew':

mb_opensuse2=# \d filearr
                                     Table "public.filearr"
  Column  |          Type          | Collation | Nullable |               Default               
----------+------------------------+-----------+----------+-------------------------------------
 id       | integer                |           | not null | nextval('filearr_id_seq'::regclass)
 path     | character varying(512) |           | not null | 
 mirrors  | smallint[]             |           |          | 
 dirname  | character varying(512) |           |          | 
 filename | character varying(512) |           |          | 
Indexes:
    "filearr_pkey" PRIMARY KEY, btree (id)
    "filearr_path_key" UNIQUE CONSTRAINT, btree (path)
    "filearr_path_key_ccnew" UNIQUE, btree (path) INVALID
    "filearr_pkey_ccnew" UNIQUE, btree (id) INVALID
    "filearr_dirname_btree" btree (dirname)
    "filearr_dirname_btree_ccnew" btree (dirname)

According to the manual, https://www.postgresql.org/docs/13/sql-reindex.html :

If the index marked INVALID is suffixed ccnew, then it corresponds to the transient index created during the concurrent operation, and the recommended recovery method is to drop it using DROP INDEX, then attempt REINDEX CONCURRENTLY again.

Actions

Copy link