tickets #103773: superfluous mirror scanning ? (apparently by mirrorcache) - openSUSE admin - openSUSE Project Management Tool

Custom queries

Events of the openSUSE Heroes
my assigned stuff
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QE tools team - backlog (ready issues)
QE tools team - backlog (w/o infra)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (w/o infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

tickets #103773

closed

superfluous mirror scanning ? (apparently by mirrorcache)

Added by pjessen over 2 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

andriinikitin

Category:

Mirrors

Target version:

Start date:

2021-12-09

Due date:

% Done:

100%

Estimated time:

Description

I have just been adding some more disks to our openSUSE mirror (http://mirror.hostsuisse.com/opensuse) and I happened to notice a lot of accesses from 195.135.221.151, aka scar.o.o . The User-Agent is "Mojolicious (Perl)" which apparently suggests this is being done by mirrorcache?

We don't mirror repositories (too big), but we are still been bombarded by requests (that all get a 404) - in the last 30 days, 1'471'058 requests, approx 50'000 per day. That can't be right?

Also, why only over IPv4?

olaf only uses IPv6 for mirror.hostsuisse.com, but only 45'934 accesses in the same period of time, i.e. 30 times less :-)

History
Notes
Property changes

Actions

Copy link

Updated by pjessen over 2 years ago

Private changed from Yes to No

Wrt the User-Agent, perhaps it might make sense to clearly identify as mirrorcache? The mirrorbrain scanner uses :

MirrorBrain Probe (see http://mirrorbrain.org/probe_info)

Something like that might be better.

Actions

Copy link

Updated by andriinikitin over 2 years ago

Category set to Mirrors
Status changed from New to In Progress
% Done changed from 0 to 90

I agree that the claim is valid and number of requests can be optimized.

It has been addressed in MirrorCache 1.021 starting now : MirrorCache mirror_scan jobs will not attempt to scan individual folders on mirrors, which do not have root folder of project as defined in
https://github.com/openSUSE/MirrorCache/blob/master/dist/salt/profile/mirrorcache/files/usr/share/mirrorcache/sql/projects.sql

It will still try to check if a mirror has a project (e.g. /repositories) once per several minutes (this can be reduced further, but I am not sure if it is necessary).

You may still expect bigger number of requests from MirrorCache comparing to MirrorBrain, because it uses different approach : instead of doing tree scans (full or partial) on each mirror - a job does scans of individual folders (without subfolders) on all mirrors. Such approach has some disadvantages, but I believe that advantages are more bold:

it is easier to promptly react on new releases of OBS projects;
it is easier to diagnose and retry particular scans;
it allows to track only those locations which are actually in use. E.g. if users use /repositories/Apache/ only on TW and 15.3 , then other locations like SLE_15 , SLE_15.1 etc - will not be tracked by the redirector (until some user starts actually using it).

So a mirror should expect more reads from MirrorCache than from MirrorBrain, but those reads will be for single folder only (instead of recursive scan) and only for those folders which were requested by users in past 2 weeks (instead of all).

I will try to address the other questions next week (http vs https load on the mirror and more descriptive user-agent hint).

Actions

Copy link

Updated by andriinikitin over 2 years ago

Status changed from In Progress to Resolved
% Done changed from 90 to 100

Also, why only over IPv4?

Hard to tell, both mirrorcache.o.o and mirrorcache-eu.o.o can use only ipv4 addresses, but if a mirror uses ipv6, it should be redirected to the mirror properly. I can use some external help setting up config for the machines if you think that ipv6 is better for scanning.

Wrt the User-Agent, perhaps it might make sense to clearly identify as mirrorcache?

That's actually a good idea, this PR should explicitly set user-agent for MirrorCache, it should be deployed on Thursday https://github.com/openSUSE/MirrorCache/pull/240

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

openSUSE admin

Tags

Custom queries

tickets #103773

superfluous mirror scanning ? (apparently by mirrorcache)

Updated by pjessen over 2 years ago

Updated by andriinikitin over 2 years ago

Updated by andriinikitin over 2 years ago