tickets #92927
closedrepository push is way behind
100%
Description
This was reported on the Heroes list by Matthew Trescott -
looking at his project, http://download.opensuse.org/repositories/home:/matthewtrescott:/openproject/openSUSE_Leap_15.2/noarch/
it (openproject-11.0.0-lp152.1.5.noarch.rpm) appears in various versions on various mirrors -
ftp.gwdg.de - 11.1.4, dated 11 Apr 2021.
provo - 11.0.0, dated 1 Nov 2020.
lysator - 11.1.4, 11 May 2021
widehat - 11.1.4, dated 11 Apr 2021.
download.o.o - 11.2.4, 20 May.
That certainly looks odd.
Looking at the push logs for April and May 2021, that project was only pushed:
5 Apr, lysator
11 Apr, gwdg, lysator, widehat
17 Apr, lysator
11 May, lysator.
It seems very odd that a project should only be pushed to lysator and nowhere else.
Updated by pjessen over 3 years ago
- Subject changed from repository push not "quite" working ? to repository push is way behind
Looking at the global log (/srv/bs/logs/2021/05/21/global-20210521.log), which is written to by the push script,
it looks like the push is way behind -
widehat - at 08:57:51, it pushed "graphics:darktable:master_Fedora_32" timestamp 07:22:34. 95minutes behind on non-home. The last time a home: project was pushed to widehat seems to be 19 May 2021 04:00:31 when 'home:mistificator:PirateCash_Arch' (dated 2021-05-18 04:42:47) was pushed out.
lysator - seems to be up-to-date with non-home, but also about 2 hours behind with home - 21 May 2021 08:58:41 +0000 hina.lysator.liu.se: start syncing home:lupinix-indi:fedora-bleeding_Fedora_Rawhide (2021-05-21 07:00:35)
stuttgart - 22 minutes behind on non-home
provo - 2 hour 42 min behind on non-home.
Looking at /srv/bs/pushed/ :
lysator - last home project pushed at 0700.
gwdg - last home project pushed at 0400
provo - last home project pushed 18 May 0442
widehat - last home project pushed before that
stuttgart - last home project pushed 04:51
Updated by pjessen over 3 years ago
pjessen wrote:
lysator - last home project pushed at 0700.
gwdg - last home project pushed at 0400
provo - last home project pushed 18 May 0442
widehat - last home project pushed before that
widehat is full:
# df
Filesystem 1K-blocks Used Available Use% Mounted on
[snip]
/dev/mapper/system-rootdisk 527396280 10661476 489874876 3% /
/dev/mapper/system-home 804913152 834932 804078220 1% /home
/dev/md0p1 19532336108 19532335880 228 100% /srv
Updated by pjessen over 3 years ago
Pontifex: have created "/etc/no_repopush_scan" which should mean we don't do an immediate scan after each push. I hope that will speed up things.
At 12:31 UTC:
espejito.fder.edu.uy - 08:06, 4h25m
ftp.gwdg.de - 11:37, 55min
ftp.uni-stuttgart.de - 12:09, 22min
hina.lysator.liu.se - 12:31, 0min.
provo - 08:23, 4h8m
widehat - 08:53, 3h38m
Updated by pjessen over 3 years ago
At 13:07 UTC:
espejito.fder.edu.uy - 08:07, 5h
ftp.gwdg.de - 13:07, 0min
ftp.uni-stuttgart.de - 13:07, 0min
hina.lysator.liu.se - 12:31, 0min.
provo - 08:23, 4h8m
widehat - 09:55, 3h12m
Updated by pjessen over 3 years ago
All push mirrors have now caught up, except provo and espejito.fder.edu.uy. provo has even gotten worse, it is now 6h40m behind.
Updated by pjessen over 3 years ago
- Status changed from New to In Progress
- Assignee set to pjessen
- % Done changed from 0 to 30
widehat - 5min behind
provo - 1h35m behind
espejito.fder.edu.uy - 2h15m behind
I think I can conclude that the scan following a push is causing significant delay. By disabling that, a newly pushed package will only be seen on the next regular scan, so there will be a gap between the push and the package being shown available on a mirror. During that gap, we will serve such packages direct from pontifex.
Updated by pjessen over 3 years ago
- Status changed from In Progress to Resolved
- % Done changed from 30 to 100
Well, it looks like everyone has now caught up, or is very close :
espejito.fder.edu.uy - 6-7min behind
ftp.gwdg.de - current.
ftp.uni-stuttgart.de - current
hina.lysator.liu.se - current
provo - current
widehat - current
I think leaving the immediate scan off is a good idea, it only seems to slow things down, not achieving much in the end. In theory leaving off the immediate scan could cause more load on pontifex (directly serving requests on repositories until the mirror is next scanned), but I doubt if it will be noticeable.
To quickly get an overview of the status, look at the timestamps of the most recently pushed packages:
cd /srv/bs
for i in servers/*
do
server="${i##*/}"
ls --full-time -rt pushed/*$server | tail -1
done
Updated by lrupp over 3 years ago
When I look at /var/log/messages on pontifex, I currently see a lot of such messages:
repopusher: skipping scanning devel:/kubic:/containers/container on ftp.gwdg.de due to /etc/no_repopush_scan
repopusher: skipping scanning security:/zeek/Debian_10 on hina.lysator.liu.se due to /etc/no_repopush_scan
/etc/no_repopush_scan points to this issue here.
Is there any plan to re-allow the repopusher to actively trigger a scan of mirrors that got the latest repopsitories puhsed to them?
Note: looking at the lib_pusher code, it looks to me like the scan will only be triggered after a successful push of the triggered repository. So I'm a bit confused, why disabling the scan after a successful push should speed things up here? Isn't it more that with disabling the scan after a push brings pontifex more under pressure, as MirrorBrain will not know about mirrors that have the most current files?
Updated by pjessen over 3 years ago
lrupp wrote:
When I look at /var/log/messages on pontifex, I currently see a lot of such messages:
repopusher: skipping scanning devel:/kubic:/containers/container on ftp.gwdg.de due to /etc/no_repopush_scan
repopusher: skipping scanning security:/zeek/Debian_10 on hina.lysator.liu.se due to /etc/no_repopush_scan
`
/etc/no_repopush_scan points to this issue here.Is there any plan to re-allow the repopusher to actively trigger a scan of mirrors that got the latest repopsitories puhsed to them?
If you ask me - no. I think it is superfluous and only causes problems as described, but I am certainly interested in hearing other ideas.
Note: looking at the lib_pusher code, it looks to me like the scan will only be triggered after a successful push of the triggered
repository. So I'm a bit confused, why disabling the scan after a successful push should speed things up here?
The best explanation I have is that the scan takes up bandwidth that could be used for pushing. The improvement after I disabled the immediate scan was quite significant. (from comment 4 and 5).
Isn't it more that with disabling the scan after a push brings pontifex more under pressure, as MirrorBrain will not know about mirrors
that have the most current files?
It is only for repositories, which is only carried by a few mirrors, including those 6 that we push to. Olaf scans continually, so although there is a time gap between a push finishing and olaf updating, I doubt if it'll really be noticeable on pontifex.
Currently, without the immediate scan, widehat and provo are still somewhat behind, but unless we get more uplink bandwidth, they probably always will be. Ignoring the mirror in Uruguay, the others are all current.
It's easy to test it - re-enable the immediate scan and see if e.g. ftp.gwdg.de starts falling behind.