tickets #49802: COSI Mirror - openSUSE admin - openSUSE Project Management Tool

Actions

Copy link

tickets #49802

closed

COSI Mirror

Added by dunbarj@clarkson.edu about 6 years ago. Updated almost 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

pjessen

Category:

Mirrors

Target version:

Start date:

Due date:

% Done:

100%

Estimated time:

Description

Hello,

I tried reaching out on #opensuse-admin and #opensuse-mirrors on Freenode,
but I wasn't able to get anyone in 24 hours time, so I figured I'd try this
email instead.

Our mirror (likely registered either as mirror.clarkson.edu,
mirror.cslabs.clarkson.edu, mirror.cosi.clarkson.edu) has been recently
(for at least a week or more) been rejected from connecting to
stage.opensuse.org. We think it may be related to a recent network change
(we have IPv6 addresses now) and despite having -4 in the rsync script,
it's not pulling down.

rsync: [generator] write error: Connection reset by peer (104)
rsync error: error in socket IO (code 10) at io.c(829) [generator=3.1.2]
rsync error: received SIGUSR1 (code 19) at main.c(1447) [receiver=3.1.2]

I was wondering if someone could help us look into it, and make sure that
our IP address is still whitelisted, and to also perhaps add the IPv6
address as well.

I can verify ownership of the mirror if necessary, I'm not sure who the
previous maintainer of the mirror was (it just got passed down), but there
was a lapse of communication.

Thank you,

Jared Dunbar

Actions

Copy link

Updated by cboltz about 6 years ago

Category set to Mirrors
Assignee set to pjessen

Actions

Copy link

Updated by pjessen about 6 years ago

Status changed from New to Feedback

Hi Jared

I see the following:
mirror.clarkson.edu -> 128.153.145.19 128.153.145.19 -> mirror.cslabs.clarkson.edu.
The stage.o.o ACL contains both "mirror.clarkson.edu" and 128.153.145.19, so it really ought to work. Unless you are now accessing from a different IP address?

Actions

Copy link

Updated by dunbarj@clarkson.edu about 6 years ago

Hi,

Hmm, that is a strange one. Those addresses are correct. We don't use
separate addresses for fetch operations, it only has one public IPv4 and
one public IPv6 address. The new public IPv6 address is
[2605:6480:c051:100::1].

I'll going to get our IT department to get us IPv6 forward records for
mirror.clarkson.edu, just for ha's to see if that does anything. I'll also
add new entries in our DNS zones for new AAAA records and reverse IPv6
pointers as well, so that the reverse resolution for IPv6 is also
mirror.cslabs.clarkson.edu. Like I said though, we're using -4 in rsync, so
this probably won't change anything.

Maybe it's an rsync bug? I'll try updating the system. It is odd that it
actually gives a sort of stack trace on failure.

I'll get back to you, once we have new records in place and perhaps update
rsync.

Thanks,

Jared D.

On Fri, Mar 29, 2019, 5:10 AM admin@opensuse.org wrote:

[openSUSE Tracker]
Issue #49802 has been updated by pjessen.

Status changed from New to Feedback

Hi Jared

I see the following:
mirror.clarkson.edu -> 128.153.145.19 128.153.145.19 -> mirror.cslabs.clarkson.edu.
The stage.o.o ACL contains both "mirror.clarkson.edu" and 128.153.145.19,
so it really ought to work. Unless you are now accessing from a different
IP address?

tickets #49802: COSI Mirror
https://progress.opensuse.org/issues/49802#change-203210

Author: dunbarj@clarkson.edu

Status: Feedback

Priority: Normal

Assignee: pjessen

Category: mirrors

Target version:

Hello,

I tried reaching out on #opensuse-admin and #opensuse-mirrors on Freenode,
but I wasn't able to get anyone in 24 hours time, so I figured I'd try this
email instead.

Our mirror (likely registered either as mirror.clarkson.edu,
mirror.cslabs.clarkson.edu, mirror.cosi.clarkson.edu) has been recently
(for at least a week or more) been rejected from connecting to
stage.opensuse.org. We think it may be related to a recent network change
(we have IPv6 addresses now) and despite having -4 in the rsync script,
it's not pulling down.

rsync: [generator] write error: Connection reset by peer (104)
rsync error: error in socket IO (code 10) at io.c(829) [generator=3.1.2]
rsync error: received SIGUSR1 (code 19) at main.c(1447) [receiver=3.1.2]

I was wondering if someone could help us look into it, and make sure that
our IP address is still whitelisted, and to also perhaps add the IPv6
address as well.

I can verify ownership of the mirror if necessary, I'm not sure who the
previous maintainer of the mirror was (it just got passed down), but there
was a lapse of communication.

Thank you,

Jared Dunbar

--
You have received this notification because you have either subscribed to
it, or are involved in it.
To change your notification preferences, please click here:
http://progress.opensuse.org/my/account

Actions

Copy link

Updated by TBro about 6 years ago

% Done changed from 0 to 50

stage.o.o is facing high traffic atm and barely gets out all the push-traffic it has to do, that's why -as far as I know- temporarily the rsyncd gets killed from time to time.

Sorry for that, but we need to get "the first line" of mirrors filled up again, before we start syncing all others again.

Best regards,
Thorsten

Actions

Copy link

Updated by dunbarj@clarkson.edu about 6 years ago

Ah, that would make a lot of sense.

And we limit our download speeds to ~250Mb/s (the IT department requires it
currently, otherwise we're on a 10G link), so our sync process lingers a
bit longer... Should we modify our script to retry in 20+ minutes from
failures? Perhaps a longer period of time? Let me know what you think is
best.

It's been failing consistently, as in, it hasn't finished syncing at all in
at least a week, which seems worse than just a simple "server is overloaded
from time to time". Perhaps when it at first went out of date, and couldn't
keep up during a bad time for you guys, it just hasn't been able to finish
the re-sync since then to grab the increasingly larger and larger set of
updates, since you probably kill the process as it tries to catch up
again... Perhaps we can coordinate a way to temporarily make sure it isn't
killed for a first resync to get up to date.

I'll look more closely and compare the start and kill times, maybe there's
a pattern.

Thanks,

Jared D.

On Fri, Mar 29, 2019, 8:55 AM admin@opensuse.org wrote:

[openSUSE Tracker]
Issue #49802 has been updated by TBro.

% Done changed from 0 to 50

stage.o.o is facing high traffic atm and barely gets out all the
push-traffic it has to do, that's why -as far as I know- temporarily the
rsyncd gets killed from time to time.

Sorry for that, but we need to get "the first line" of mirrors filled up
again, before we start syncing all others again.

Best regards,
Thorsten

tickets #49802: COSI Mirror
https://progress.opensuse.org/issues/49802#change-203318

Author: dunbarj@clarkson.edu

Status: Feedback

Priority: Normal

Assignee: pjessen

Category: mirrors

Target version:

Hello,

I tried reaching out on #opensuse-admin and #opensuse-mirrors on Freenode,
but I wasn't able to get anyone in 24 hours time, so I figured I'd try this
email instead.

Our mirror (likely registered either as mirror.clarkson.edu,
mirror.cslabs.clarkson.edu, mirror.cosi.clarkson.edu) has been recently
(for at least a week or more) been rejected from connecting to
stage.opensuse.org. We think it may be related to a recent network change
(we have IPv6 addresses now) and despite having -4 in the rsync script,
it's not pulling down.

rsync: [generator] write error: Connection reset by peer (104)
rsync error: error in socket IO (code 10) at io.c(829) [generator=3.1.2]
rsync error: received SIGUSR1 (code 19) at main.c(1447) [receiver=3.1.2]

I was wondering if someone could help us look into it, and make sure that
our IP address is still whitelisted, and to also perhaps add the IPv6
address as well.

I can verify ownership of the mirror if necessary, I'm not sure who the
previous maintainer of the mirror was (it just got passed down), but there
was a lapse of communication.

Thank you,

Jared Dunbar

--
You have received this notification because you have either subscribed to
it, or are involved in it.
To change your notification preferences, please click here:
http://progress.opensuse.org/my/account

Actions

Copy link

Updated by dunbarj@clarkson.edu about 6 years ago

Hello,

The mirror has still not synced since last correspondence. I'm not sure how
to proceed.

Apr 16 06:11:11 mirror opensuse.sh[6880]: rsync: [generator] write error:
Connection reset by peer (104)
Apr 16 06:11:11 mirror opensuse.sh[6880]: rsync error: error in socket IO
(code 10) at io.c(829) [generator=3.1.2]
Apr 16 06:11:11 mirror opensuse.sh[6880]: rsync error: received SIGUSR1
(code 19) at main.c(1447) [receiver=3.1.2]

We've had our DNS AAAA records in place for mirror.clarkson.edu and reverse
records as well, so the DNS has definitely settled out, so as long as you
have both addresses, it should be working. We have a -4 in our rsync, but
just in case.

Thanks,

Jared D.

Actions

Copy link

Updated by pjessen about 6 years ago

Looks like part of the process is fine:

# grep 128.153.145.19 /var/log/rsyncd-stage.log
2019/04/22 04:00:01 [16374] connect from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 04:00:02 [16374] rsync on opensuse-full-with-factory/ from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 10:00:06 [17865] connect from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 10:00:06 [17865] rsync on opensuse-full-with-factory/ from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 16:00:06 [21013] connect from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 16:00:07 [21013] rsync on opensuse-full-with-factory/ from mirror.cslabs.clarkson.edu (128.153.145.19)

However:

# grep 21013 /var/log/rsyncd-stage.log
2019/04/22 16:00:06 [21013] connect from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 16:00:07 [21013] rsync on opensuse-full-with-factory/ from mirror.cslabs.clarkson.edu (128.153.145.19)
2019/04/22 16:00:07 [21013] building file list
2019/04/22 16:03:06 [21013] deflate on token returned 0 (28880 bytes left)
2019/04/22 16:03:06 [21013] rsync error: error in rsync protocol data stream (code 12) at token.c(427) [sender=3.1.3]

I'll have to look that up.

Actions

Copy link

Updated by pjessen about 6 years ago

Private changed from Yes to No

"deflate on token returned 0" appears to be about a compression issue. We actually disable compression on all file types, so that seems illogical.
Jared, just as a wild guess - if you are running rsync with -z, try removing it?

Actions

Copy link

Updated by dunbarj@clarkson.edu about 6 years ago

Yeah, we have -z i our script. I removed it, and started a sync to see how
it goes.

Jared

Actions

Copy link

#10

Updated by dunbarj@clarkson.edu about 6 years ago

Alright... It hasn't crashed yet and is currently transferring data from
the looks of it. I'm going to let it run for a while (It's really out of
date) and check back in a few hours. Hopefully that resolved the issue,
I'll let you know if it syncs completely.

... I do find it odd that the script worked until now with -z.

Actions

Copy link

#11

Updated by pjessen about 6 years ago

dunbarj@clarkson.edu wrote:

Alright... It hasn't crashed yet and is currently transferring data from
the looks of it. I'm going to let it run for a while (It's really out of
date) and check back in a few hours. Hopefully that resolved the issue,
I'll let you know if it syncs completely.

... I do find it odd that the script worked until now with -z.

Ditto - I don't understand why it isn't just being ignored when we have "dont compress = *"

Actions

Copy link

#12

Updated by dunbarj@clarkson.edu about 6 years ago

Hi,

Things look to be syncing alright, if it weren't for the failsafe mechanism
we have to prevent stuck rsync processes... Our automation system kills the
process after 4 hours to prevent syncs from running too long and stacking
up on one another.

I'll have to manually run the rsync update command for as long as it takes
and then let our automation script handle it after that. So that's why the
logs may look odd and terminated rsync syncs from our end. Oops.

Once it's synced successfully once, I'll let you know :)

Actions

Copy link

#13

Updated by pjessen about 6 years ago

Status changed from Feedback to In Progress

dunbarj@clarkson.edu wrote:

Hi,

Things look to be syncing alright, if it weren't for the failsafe mechanism
we have to prevent stuck rsync processes... Our automation system kills the
process after 4 hours to prevent syncs from running too long and stacking
up on one another.

Hi Jared

heh, ran into that very same issue not long ago. On our mirror, sync'ing Tumbleweed took a very long time, and kept being killed off.

I'll have to manually run the rsync update command for as long as it takes
and then let our automation script handle it after that. So that's why the
logs may look odd and terminated rsync syncs from our end. Oops.

Once it's synced successfully once, I'll let you know :)

Cool, thanks.

Actions

Copy link

#14

Updated by dunbarj@clarkson.edu about 6 years ago

Hi,

Just a heads up that we're syncing again!
I'd say this is resolved as a result, but if something crops up, I'll pop
on back over here with the problem. I just watched it at least twice sync
successfully.

Actions

Copy link

#15

Updated by pjessen almost 6 years ago

Status changed from In Progress to Resolved
% Done changed from 50 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

openSUSE admin

Tags

Custom queries

tickets #49802

COSI Mirror

Updated by cboltz about 6 years ago

Updated by pjessen about 6 years ago

Updated by dunbarj@clarkson.edu about 6 years ago

Updated by TBro about 6 years ago

Updated by dunbarj@clarkson.edu about 6 years ago

Updated by dunbarj@clarkson.edu about 6 years ago

Updated by pjessen about 6 years ago

Updated by pjessen about 6 years ago

Updated by dunbarj@clarkson.edu about 6 years ago

Updated by dunbarj@clarkson.edu about 6 years ago

Updated by pjessen about 6 years ago

Updated by dunbarj@clarkson.edu about 6 years ago

Updated by pjessen about 6 years ago

Updated by dunbarj@clarkson.edu about 6 years ago

Updated by pjessen almost 6 years ago