I agree that the claim is valid and number of requests can be optimized.
It has been addressed in MirrorCache 1.021 starting now : MirrorCache mirror_scan jobs will not attempt to scan individual folders on mirrors, which do not have root folder of project as defined in
https://github.com/openSUSE/MirrorCache/blob/master/dist/salt/profile/mirrorcache/files/usr/share/mirrorcache/sql/projects.sql
It will still try to check if a mirror has a project (e.g. /repositories) once per several minutes (this can be reduced further, but I am not sure if it is necessary).
You may still expect bigger number of requests from MirrorCache comparing to MirrorBrain, because it uses different approach : instead of doing tree scans (full or partial) on each mirror - a job does scans of individual folders (without subfolders) on all mirrors. Such approach has some disadvantages, but I believe that advantages are more bold:
- it is easier to promptly react on new releases of OBS projects;
- it is easier to diagnose and retry particular scans;
- it allows to track only those locations which are actually in use. E.g. if users use /repositories/Apache/ only on TW and 15.3 , then other locations like SLE_15 , SLE_15.1 etc - will not be tracked by the redirector (until some user starts actually using it).
So a mirror should expect more reads from MirrorCache than from MirrorBrain, but those reads will be for single folder only (instead of recursive scan) and only for those folders which were requested by users in past 2 weeks (instead of all).
I will try to address the other questions next week (http vs https load on the mirror and more descriptive user-agent hint).