tickets #49802

COSI Mirror

Added by dunbarj@clarkson.edu 11 months ago. Updated 10 months ago.

Status:ResolvedStart date:
Priority:NormalDue date:
Assignee:pjessen% Done:

100%

Category:mirrors
Target version:-
Duration:

Description

Hello,

I tried reaching out on #opensuse-admin and #opensuse-mirrors on Freenode,
but I wasn't able to get anyone in 24 hours time, so I figured I'd try this
email instead.

Our mirror (likely registered either as mirror.clarkson.edu,
mirror.cslabs.clarkson.edu, mirror.cosi.clarkson.edu) has been recently
(for at least a week or more) been rejected from connecting to
stage.opensuse.org. We think it may be related to a recent network change
(we have IPv6 addresses now) and despite having -4 in the rsync script,
it's not pulling down.

rsync: [generator] write error: Connection reset by peer (104)
rsync error: error in socket IO (code 10) at io.c(829) [generator=3.1.2]
rsync error: received SIGUSR1 (code 19) at main.c(1447) [receiver=3.1.2]

I was wondering if someone could help us look into it, and make sure that
our IP address is still whitelisted, and to also perhaps add the IPv6
address as well.

I can verify ownership of the mirror if necessary, I'm not sure who the
previous maintainer of the mirror was (it just got passed down), but there
was a lapse of communication.

Thank you,

Jared Dunbar

History

#1 Updated by cboltz 11 months ago

  • Category set to mirrors
  • Assignee set to pjessen

#2 Updated by pjessen 11 months ago

  • Status changed from New to Feedback

Hi Jared

I see the following:

mirror.clarkson.edu -> 128.153.145.19
128.153.145.19 -> mirror.cslabs.clarkson.edu.

The stage.o.o ACL contains both "mirror.clarkson.edu" and 128.153.145.19, so it really ought to work. Unless you are now accessing from a different IP address?

#3 Updated by dunbarj@clarkson.edu 11 months ago

Hi,

Hmm, that is a strange one. Those addresses are correct. We don't use
separate addresses for fetch operations, it only has one public IPv4 and
one public IPv6 address. The new public IPv6 address is
[2605:6480:c051:100::1].

I'll going to get our IT department to get us IPv6 forward records for
mirror.clarkson.edu, just for ha's to see if that does anything. I'll also
add new entries in our DNS zones for new AAAA records and reverse IPv6
pointers as well, so that the reverse resolution for IPv6 is also
mirror.cslabs.clarkson.edu. Like I said though, we're using -4 in rsync, so
this probably won't change anything.

Maybe it's an rsync bug? I'll try updating the system. It is odd that it
actually gives a sort of stack trace on failure.

I'll get back to you, once we have new records in place and perhaps update
rsync.

Thanks,

Jared D.

On Fri, Mar 29, 2019, 5:10 AM admin@opensuse.org wrote:

[openSUSE Tracker]

Issue #49802 has been updated by pjessen.


Status changed from New to Feedback


Hi Jared


I see the following:


mirror.clarkson.edu -> 128.153.145.19

128.153.145.19 -> mirror.cslabs.clarkson.edu.


The stage.o.o ACL contains both "mirror.clarkson.edu" and 128.153.145.19,

so it really ought to work. Unless you are now accessing from a different

IP address?




tickets #49802: COSI Mirror
https://progress.opensuse.org/issues/49802#change-203210


  • Author: dunbarj@clarkson.edu
  • Status: Feedback
  • Priority: Normal
  • Assignee: pjessen
  • Category: mirrors

* Target version:


Hello,


I tried reaching out on #opensuse-admin and #opensuse-mirrors on Freenode,

but I wasn't able to get anyone in 24 hours time, so I figured I'd try this

email instead.


Our mirror (likely registered either as mirror.clarkson.edu,

mirror.cslabs.clarkson.edu, mirror.cosi.clarkson.edu) has been recently

(for at least a week or more) been rejected from connecting to

stage.opensuse.org. We think it may be related to a recent network change

(we have IPv6 addresses now) and despite having -4 in the rsync script,

it's not pulling down.


rsync: [generator] write error: Connection reset by peer (104)

rsync error: error in socket IO (code 10) at io.c(829) [generator=3.1.2]

rsync error: received SIGUSR1 (code 19) at main.c(1447) [receiver=3.1.2]


I was wondering if someone could help us look into it, and make sure that

our IP address is still whitelisted, and to also perhaps add the IPv6

address as well.


I can verify ownership of the mirror if necessary, I'm not sure who the

previous maintainer of the mirror was (it just got passed down), but there

was a lapse of communication.


Thank you,


Jared Dunbar



You have received this notification because you have either subscribed to

it, or are involved in it.

To change your notification preferences, please click here:
http://progress.opensuse.org/my/account

#4 Updated by TBro 11 months ago

  • % Done changed from 0 to 50

stage.o.o is facing high traffic atm and barely gets out all the push-traffic it has to do, that's why -as far as I know- temporarily the rsyncd gets killed from time to time.

Sorry for that, but we need to get "the first line" of mirrors filled up again, before we start syncing all others again.

Best regards,
Thorsten

#5 Updated by dunbarj@clarkson.edu 11 months ago

Ah, that would make a lot of sense.

And we limit our download speeds to ~250Mb/s (the IT department requires it
currently, otherwise we're on a 10G link), so our sync process lingers a
bit longer... Should we modify our script to retry in 20+ minutes from
failures? Perhaps a longer period of time? Let me know what you think is
best.

It's been failing consistently, as in, it hasn't finished syncing at all in
at least a week, which seems worse than just a simple "server is overloaded
from time to time". Perhaps when it at first went out of date, and couldn't
keep up during a bad time for you guys, it just hasn't been able to finish
the re-sync since then to grab the increasingly larger and larger set of
updates, since you probably kill the process as it tries to catch up
again... Perhaps we can coordinate a way to temporarily make sure it isn't
killed for a first resync to get up to date.

I'll look more closely and compare the start and kill times, maybe there's
a pattern.

Thanks,

Jared D.

On Fri, Mar 29, 2019, 8:55 AM admin@opensuse.org wrote:

[openSUSE Tracker]

Issue #49802 has been updated by TBro.


% Done changed from 0 to 50


stage.o.o is facing high traffic atm and barely gets out all the

push-traffic it has to do, that's why -as far as I know- temporarily the

rsyncd gets killed from time to time.


Sorry for that, but we need to get "the first line" of mirrors filled up

again, before we start syncing all others again.


Best regards,

Thorsten




tickets #49802: COSI Mirror
https://progress.opensuse.org/issues/49802#change-203318


  • Author: dunbarj@clarkson.edu
  • Status: Feedback
  • Priority: Normal
  • Assignee: pjessen
  • Category: mirrors

* Target version:


Hello,


I tried reaching out on #opensuse-admin and #opensuse-mirrors on Freenode,

but I wasn't able to get anyone in 24 hours time, so I figured I'd try this

email instead.


Our mirror (likely registered either as mirror.clarkson.edu,

mirror.cslabs.clarkson.edu, mirror.cosi.clarkson.edu) has been recently

(for at least a week or more) been rejected from connecting to

stage.opensuse.org. We think it may be related to a recent network change

(we have IPv6 addresses now) and despite having -4 in the rsync script,

it's not pulling down.


rsync: [generator] write error: Connection reset by peer (104)

rsync error: error in socket IO (code 10) at io.c(829) [generator=3.1.2]

rsync error: received SIGUSR1 (code 19) at main.c(1447) [receiver=3.1.2]


I was wondering if someone could help us look into it, and make sure that

our IP address is still whitelisted, and to also perhaps add the IPv6

address as well.


I can verify ownership of the mirror if necessary, I'm not sure who the

previous maintainer of the mirror was (it just got passed down), but there

was a lapse of communication.


Thank you,


Jared Dunbar



You have received this notification because you have either subscribed to

it, or are involved in it.

To change your notification preferences, please click here:
http://progress.opensuse.org/my/account

#6 Updated by dunbarj@clarkson.edu 10 months ago

Hello,

The mirror has still not synced since last correspondence. I'm not sure how
to proceed.

Apr 16 06:11:11 mirror opensuse.sh[6880]: rsync: [generator] write error:
Connection reset by peer (104)
Apr 16 06:11:11 mirror opensuse.sh[6880]: rsync error: error in socket IO
(code 10) at io.c(829) [generator=3.1.2]
Apr 16 06:11:11 mirror opensuse.sh[6880]: rsync error: received SIGUSR1
(code 19) at main.c(1447) [receiver=3.1.2]

We've had our DNS AAAA records in place for mirror.clarkson.edu and reverse
records as well, so the DNS has definitely settled out, so as long as you
have both addresses, it should be working. We have a -4 in our rsync, but
just in case.

Thanks,

Jared D.

#7 Updated by pjessen 10 months ago

Looks like part of the process is fine:

# grep 128.153.145.19 /var/log/rsyncd-stage.log
2019/04/22 04:00:01 [16374] connect from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 04:00:02 [16374] rsync on opensuse-full-with-factory/ from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 10:00:06 [17865] connect from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 10:00:06 [17865] rsync on opensuse-full-with-factory/ from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 16:00:06 [21013] connect from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 16:00:07 [21013] rsync on opensuse-full-with-factory/ from mirror.cslabs.clarkson.edu (128.153.145.19)

However:

# grep 21013 /var/log/rsyncd-stage.log
2019/04/22 16:00:06 [21013] connect from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 16:00:07 [21013] rsync on opensuse-full-with-factory/ from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 16:00:07 [21013] building file list
2019/04/22 16:03:06 [21013] deflate on token returned 0 (28880 bytes left)
2019/04/22 16:03:06 [21013] rsync error: error in rsync protocol data stream (code 12) at token.c(427) [sender=3.1.3]

I'll have to look that up.

#8 Updated by pjessen 10 months ago

  • Private changed from Yes to No

"deflate on token returned 0" appears to be about a compression issue. We actually disable compression on all file types, so that seems illogical.
Jared, just as a wild guess - if you are running rsync with -z, try removing it?

#9 Updated by dunbarj@clarkson.edu 10 months ago

Yeah, we have -z i our script. I removed it, and started a sync to see how
it goes.

Jared

#10 Updated by dunbarj@clarkson.edu 10 months ago

Alright... It hasn't crashed yet and is currently transferring data from
the looks of it. I'm going to let it run for a while (It's really out of
date) and check back in a few hours. Hopefully that resolved the issue,
I'll let you know if it syncs completely.

... I do find it odd that the script worked until now with -z.

#11 Updated by pjessen 10 months ago

dunbarj@clarkson.edu wrote:

Alright... It hasn't crashed yet and is currently transferring data from

the looks of it. I'm going to let it run for a while (It's really out of

date) and check back in a few hours. Hopefully that resolved the issue,

I'll let you know if it syncs completely.


... I do find it odd that the script worked until now with -z.

Ditto - I don't understand why it isn't just being ignored when we have "dont compress = *"

#12 Updated by dunbarj@clarkson.edu 10 months ago

Hi,

Things look to be syncing alright, if it weren't for the failsafe mechanism
we have to prevent stuck rsync processes... Our automation system kills the
process after 4 hours to prevent syncs from running too long and stacking
up on one another.

I'll have to manually run the rsync update command for as long as it takes
and then let our automation script handle it after that. So that's why the
logs may look odd and terminated rsync syncs from our end. Oops.

Once it's synced successfully once, I'll let you know :)

#13 Updated by pjessen 10 months ago

  • Status changed from Feedback to In Progress

dunbarj@clarkson.edu wrote:

Hi,


Things look to be syncing alright, if it weren't for the failsafe mechanism

we have to prevent stuck rsync processes... Our automation system kills the

process after 4 hours to prevent syncs from running too long and stacking

up on one another.

Hi Jared

heh, ran into that very same issue not long ago. On our mirror, sync'ing Tumbleweed took a very long time, and kept being killed off.

I'll have to manually run the rsync update command for as long as it takes

and then let our automation script handle it after that. So that's why the

logs may look odd and terminated rsync syncs from our end. Oops.


Once it's synced successfully once, I'll let you know :)

Cool, thanks.

#14 Updated by dunbarj@clarkson.edu 10 months ago

Hi,

Just a heads up that we're syncing again!
I'd say this is resolved as a result, but if something crops up, I'll pop
on back over here with the problem. I just watched it at least twice sync
successfully.

#15 Updated by pjessen 10 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Also available in: Atom PDF