Project

General

Profile

tickets #115142

Auto nearby mirror

Added by dnl028@gmail.com about 2 months ago. Updated about 1 month ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Mirrors
Target version:
-
Start date:
2022-08-10
Due date:
% Done:

0%

Estimated time:

Description

Hi,

Just had a mirror related question - it's a long story why, but was
looking to automatically determine the ipv4 address of a nearby mirror,
in a bash script.

Web searching hasn't turned anything up. Are there any common CLI tools
you can recommend to help do this?

Thank you
Dan

History

#1 Updated by pjessen about 2 months ago

  • Private changed from Yes to No

Not really on-topic here :-) but never mind - once you have the name of a nearby mirror, the address is easy. You can get
a list of mirrors by retrieving a mirrorlist for a given package. (just append .mirrorlist to the URL). "nearby" is a little more vague though.
Hope this helps.

#2 Updated by avicenzi about 2 months ago

You can use:

dig example.com

This will return the IP address, in fact, dig can do DNS lookup from the cmd line.

#3 Updated by dnl028@gmail.com about 2 months ago

Ok, thank you.

Mirrors in same continent are "nearby" enough.
It sounds like the advice is to grab the mirrorlist file and parse it
for a mirror url, then use dig to determine that mirror's ip address.

Sounds good, so...

mirrorlist_url="$(zypper --table-style 11 --quiet list-updates --repo

'Main Repository (OSS)' | /usr/bin/awk 'NR==3 {print
"http://download.opensuse.org/tumbleweed/repo/oss/"$5"/"$2"-"$4"."$5".rpm.mirrorlist"}')

echo "$mirrorlist_url"

http://download.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.mirrorlist

cat accountsservice-22.08.8-2.1.x86_64.rpm.mirrorlist

...
Mirrors which handle this country:
Loading...
...

Don't know why wget is not grabbing the domestic list of mirrors, and
have not found an option in man wget to fix this.

Also, even manually navigating in a web browser to the mirrorlist page
and grabbing a mirror domain...
Mirrors which handle this country:
http://mirrors.ocf.berkeley.edu/opensuse/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
(US)

...then...

dig -4 +short mirrors.ocf.berkeley.edu

fallingrocks.ocf.berkeley.edu.
169.229.200.70

...then...

wget

'http://169.229.200.70/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm'
...
HTTP request sent, awaiting response... 404 Not Found
...

However...

wget

'http://mirrors.ocf.berkeley.edu/opensuse/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm'
...
HTTP request sent, awaiting response... 200 OK
Length: 85753 (84K) [application/x-redhat-package-manager]
Saving to: ‘accountsservice-22.08.8-2.1.x86_64.rpm’
accountsservice-22.08.8-2.1.x86_64.rpm
100%[======================================================================================================================>]
83.74K 405KB/s in 0.2s

So assuming this is the intended approach, there's two problems so far.
1) The downloaded mirrorlist file is missing the nearby/domestic mirrors
2) Resolving a mirror's ip address does not result in a functioning url

What do you advise?

On 8/10/2022 10:15 AM, redmine@opensuse.org wrote:

[openSUSE Tracker]
Issue #115142 has been updated by avicenzi.

You can use:

dig example.com

This will return the IP address, in fact, dig can do DNS lookup from the cmd line.


tickets #115142: Auto nearby mirror
https://progress.opensuse.org/issues/115142#change-544766

* Target version:

Hi,

Just had a mirror related question - it's a long story why, but was
looking to automatically determine the ipv4 address of a nearby mirror,
in a bash script.

Web searching hasn't turned anything up. Are there any common CLI tools
you can recommend to help do this?

Thank you
Dan

#4 Updated by pjessen about 2 months ago

  • Category set to Mirrors
  • Assignee set to andriinikitin

dnl028@gmail.com wrote:

Ok, thank you.

Mirrors in same continent are "nearby" enough.
It sounds like the advice is to grab the mirrorlist file and parse it
for a mirror url, then use dig to determine that mirror's ip address.

Yep, that's pretty much it. Instead of "dig", you can also use "host".

# mirrorlist_url="$(zypper --table-style 11 --quiet list-updates --repo
'Main Repository (OSS)' | /usr/bin/awk 'NR==3 {print
"http://download.opensuse.org/tumbleweed/repo/oss/"$5"/"$2"-"$4"."$5".rpm.mirrorlist"}')
# echo "$mirrorlist_url"
http://download.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.mirrorlist

# cat accountsservice-22.08.8-2.1.x86_64.rpm.mirrorlist
...
Mirrors which handle this country:
Loading...
...

Don't know why wget is not grabbing the domestic list of mirrors, and
have not found an option in man wget to fix this.

Hmm, I guess we have somehow changed to using javascript for providing that page. That is of course a bit silly, and will certainly prevent my suggestion from working. Andrii, can we please fix this? (or provide an alternative).

So assuming this is the intended approach, there's two problems so far.
1) The downloaded mirrorlist file is missing the nearby/domestic mirrors
2) Resolving a mirror's ip address does not result in a functioning url

1) would seem to be a mirrorcache/brain issue.

2) no, it never does. It is only an address lookup. If you are using wget or curl anyway, I do not understand the need to look up the address?

#5 Updated by dnl028@gmail.com about 2 months ago

On 8/10/2022 2:16 PM, redmine@opensuse.org wrote:

Hmm, I guess we have somehow changed to using javascript for providing that page. That is of course a bit silly, and will certainly prevent my suggestion from working. Andrii, can we please fix this? (or provide an alternative).
Ok, a fix is much appreciated.

So assuming this is the intended approach, there's two problems so far.
1) The downloaded mirrorlist file is missing the nearby/domestic mirrors
2) Resolving a mirror's ip address does not result in a functioning url
1) would seem to be a mirrorcache/brain issue.
Ok, as above.
2) no, it never does. It is only an address lookup. If you are using wget or curl anyway, I do not understand the need to look up the address?
Fine, though isn't 91.193.113.70 a mirror? While this also fails...
http://91.193.113.70/opensuse/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
...this succeeds...
http://91.193.113.70/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
(cut out '/opensuse')
169.229.200.70 fails in both of the above urls.

More to the point, 169.229.200.70 also fails in a repo url, while
91.193.113.70 succeeds in a repo url.
Which brings us to your question - an ip address is not needed for
grabbing the mirrorlist. The reason for wanting a repo mirror ip
address, is to refer to it in host firewall rules, since referring to
urls is not much an option.
Though could manually grab and hard-code in a mirror ip address, some
experience has shown it won't be static/available for all that long.
Thus, it seemed best to obtain a good mirror dynamically.

#6 Updated by pjessen about 2 months ago

dnl028@gmail.com wrote:

2) no, it never does. It is only an address lookup. If you are using wget or curl anyway, I do not understand the need to look up the address?
Fine, though isn't 91.193.113.70 a mirror?

Yes, 91.193.113.70 is provo-mirror.opensuse.org.

While this also fails...
http://91.193.113.70/opensuse/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
...this succeeds...
http://91.193.113.70/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
(cut out '/opensuse')

But that is to be expected?

169.229.200.70 fails in both of the above urls.

that one is mirrors.ocf.berkeley.edu - unable to reproduce:

per@localhost:~> wget -O /dev/null http://mirrors.ocf.berkeley.edu/opensuse/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
--2022-08-11 10:04:16--  http://mirrors.ocf.berkeley.edu/opensuse/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
Resolving mirrors.ocf.berkeley.edu (mirrors.ocf.berkeley.edu)... 169.229.200.70, 2607:f140:0:32::70
Connecting to mirrors.ocf.berkeley.edu (mirrors.ocf.berkeley.edu)|169.229.200.70|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 85753 (84K) [application/x-redhat-package-manager]

Which brings us to your question - an ip address is not needed for
grabbing the mirrorlist. The reason for wanting a repo mirror ip
address, is to refer to it in host firewall rules, since referring to
urls is not much an option.

True. You want to block ingress traffic from our mirrors?

Most mirrors don't change addresses very often, so you could just grab every one of them from https://mirrors.opensuse.org and hardcode their IP-addresses in your firewall rules. Or update your rules once a day etc.

#7 Updated by andriinikitin about 2 months ago

  • Status changed from New to Feedback

.mirrorlist on download.o.o initially fetches list of mirrors that are in the main DB (i.e. at the moment it has no North America mirrors), then it uses javascript to fetch mirrors from other MirrorCache instances. wget doesn't do javascript part, this is why you see US mirrors only in browser.

You should better use mirrorcache-us.opensuse.org, and probably better use .metalink instead of .mirrorlist - it will bring list of max top 10 mirrors in xml format. If you still want to stick to .mirrorlist - maybe you will like json output by adding ?json parameter, i.e. :

http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.metalink

http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.mirrorlist?json

I also think that this feature can be implemented on server side, so you get IP addresses right away in xml or json by implementing new url parameter like ?hostnames=ipv4 . Currently MirrorCache doesn't track IP addresses of mirrors, but it may be done - open ticket in github and I will provide instructions if you wish a bit of server side development.

2) Resolving a mirror's ip address does not result in a functioning url

Each mirror has own prefix in path, check urldir column at https://mirrorcache-us.opensuse.org/app/server . I think you do it wrong: you just add path to ip. Instead, you may need to replace hostname to ip in url from .mirrorlist or use urldir column from https://mirrorcache-us.opensuse.org/rest/server to compile result url.

Regards,
Andrii Nikitin

#8 Updated by pjessen about 2 months ago

andriinikitin wrote:

You should better use mirrorcache-us.opensuse.org, and probably better use .metalink instead of .mirrorlist - it will bring list of max top 10 mirrors in xml format. If you still want to stick to .mirrorlist - maybe you will like json output by adding ?json parameter, i.e. :

http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.metalink
http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.mirrorlist?json

That sounds like a good idea and json is easy to deal with.

I also think that this feature can be implemented on server side, so you get IP addresses right away in xml or json by implementing new url parameter like ?hostnames=ipv4 .

Maybe take into account that the some DNS lookups are geoloc sensitive.

Each mirror has own prefix in path, check urldir column at https://mirrorcache-us.opensuse.org/app/server . I think you do it wrong: you just add path to ip.

+1

#9 Updated by dnl028@gmail.com about 2 months ago

On 8/11/2022 4:12 AM, redmine@opensuse.org wrote:

But that is to be expected?

Didn't realize at the time about mirror prefix/directories, as explained
fully in upcoming reply.
True. You want to block ingress traffic from our mirrors?
No. Trying to block other egress, but allow repo egress.

Most mirrors don't change addresses very often, so you could just grab every one of them fromhttps://mirrors.opensuse.org and hardcode their IP-addresses in your firewall rules. Or update your rules once a day etc.
Would much rather not hard-code ip addresses into the script, especially
since it would seem any of them might later expire and then be leased to
some other domain, and don't want to accidentally interface with some
random domain/device that is not a repo server. Do you disagree?
Maybe take into account that the some DNS lookups are geoloc sensitive.
Using e.g. dig to resolve a mirror domain, into a mirror ipv4 could be
disrupted by geoloc?

Thank you very much for your help

Dan

#10 Updated by dnl028@gmail.com about 2 months ago

Did not previously account for different mirrors having different
directories before rest of url. Thanks, that explains why this works...
http://91.193.113.70/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
...and this does not...
http://169.229.200.70/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
...but this does...
http://169.229.200.70/opensuse/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm
...because it has a '/opensuse' directory, while 91.193.113.70 does not.

Now as for getting such directory info,
https://mirrorcache-us.opensuse.org/app/server does not seem scrapable
(also it might not be distance sorted).
https://mirrorcache-us.opensuse.org/rest/server is scrapable, but unless
that list is sorted by distance/speed, then it's necessary to first
scrape an rpm.metalink file to get distance/best sorted mirrors.
So, as it seems simpler to grab everything from one page/file, then
might as well grab the mirror domain and directory from an rpm.metalink
(or rpm.mirrorlist?json) file.
While not certain of everything you're saying, it seems the following
commands accomplish what you mean...

(woops - prior messages should have used pound/hash for comment, just

like this line, and dollar sign for code lines, just like the following
commands)
$
metalink_url='http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.metalink'

maybe it's silly to parse xml/json with regex, but not yet familiar

with e.g. xmlstarlet, jq, and command below should do for now

should probably use the first line/domain of output - for now limit

output to only that one with 'head -1' below
$ while read domain dir; do
mirror_ip="$(dig -4 +short "$domain")"
mirror_dir="$dir"
done < <( \
wget --quiet --output-document=- "$metalink_url" \
| sed --regexp-extended --silent 's/.*location=\"US\"
preference=\"[[:digit:]]+\">https?:\/\/(.)\/tumbleweed..rpm<\/url>/\1/ p'
\
| head -1 \
| sed 's/\// /' \
)

$ echo $mirror_ip $mirror_dir
194.26.236.150 opensuse
$ echo "http://${mirror_ip}/${mirror_dir}/tumbleweed/repo/oss/" # this
should be a working repo url
http://194.26.236.150/opensuse/tumbleweed/repo/oss/

Is this essentially what you meant? Should using just an rpm.metalink
(or just rpm.mirrorlist?json) file work well without much regression, or
is there a better / more resilient way?

One issue though, is that mirrors don't necessarily seem to have the
update repo.
Added these as repos...
http://194.26.236.150/opensuse/tumbleweed/repo/non-oss/
http://194.26.236.150/opensuse/tumbleweed/repo/oss/
Then ran 'sudo zypper refresh' and they seem fine.
However, http://194.26.236.150/opensuse/update/tumbleweed/ seems not to
exist.
Is it ok to just disable/exclude the update repo for tumbleweed?

Thank you very much for your help

Dan

On 8/11/2022 4:24 AM, redmine@opensuse.org wrote:

[openSUSE Tracker]
Issue #115142 has been updated by andriinikitin.

Status changed from New to Feedback

.mirrorlist on download.o.o initially fetches list of mirrors that are in the main DB (i.e. at the moment it has no North America mirrors), then it uses javascript to fetch mirrors from other MirrorCache instances.

You should better use mirrorcache-us.opensuse.org, and probably better use .metalink instead of .mirrorlist - it will bring list of max top 10 mirrors in xml format. If you still want to stick to .mirrorlist - maybe you will like json output by adding ?json parameter, i.e. :

http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.metalink

http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.mirrorlist?json

I also think that this feature can be implemented on server side, so you get IP addresses right away in xml or json by implementing new url parameter like ?hostnames=ipv4 . Currently MirrorCache doesn't track IP addresses of mirrors, but it may be done - open ticket in github and I will provide instructions if you wish a bit of server side development.

2) Resolving a mirror's ip address does not result in a functioning url
Each mirror has own prefix in path, check urldir column at https://mirrorcache-us.opensuse.org/app/server . I think you do it wrong: you just add path to ip. Instead, you may need to replace hostname to ip in url from .mirrorlist or use urldir column from https://mirrorcache-us.opensuse.org/rest/server to compile result url.

Regards,
Andrii Nikitin


tickets #115142: Auto nearby mirror
https://progress.opensuse.org/issues/115142#change-544916

  • Author: dnl028@gmail.com
  • Status: Feedback
  • Priority: Normal
  • Assignee: andriinikitin
  • Category: Mirrors

* Target version:

Hi,

Just had a mirror related question - it's a long story why, but was
looking to automatically determine the ipv4 address of a nearby mirror,
in a bash script.

Web searching hasn't turned anything up. Are there any common CLI tools
you can recommend to help do this?

Thank you
Dan

#11 Updated by andriinikitin about 1 month ago

dnl028@gmail.com wrote:

https://mirrorcache-us.opensuse.org/app/server does not seem scrapable

/app/server uses javascript to represent content of /rest/server, so yes - wget normally is not useful for /app route.

https://mirrorcache-us.opensuse.org/rest/server is scrapable, but unless
that list is sorted by distance/speed, then it's necessary to first
scrape an rpm.metalink file to get distance/best sorted mirrors.

The sort order is meaningless currently, it is just order by which mirrors were added to DB.

Instead of constructing new urls from urldir - I'd just replace hostnames with ip addresses in directly in urls provided by metalink or mirrorlist.
If you want to stick to bash - I had some success parsing json with jq tool, then .mirrorlist may be the way to go.
Something like:

curl -s 'http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/noarch/accountsservice-lang-22.08.8-2.1.noarch.rpm.mirrorlist?json' | jq -r '.l1[].url,l2[].url,.l3[].url'

Then you split the url into hostname and tail, resolve hostname and add its tail back to the IP address.
I see that the tail may be too long for you currently - so you can truncate it to /tumbleweed and it should be fine. It looks easier than dealing with /rest/server

Is this essentially what you meant? Should using just an rpm.metalink
(or just rpm.mirrorlist?json) file work well without much regression, or
is there a better / more resilient way?

I don't see a scenario where support for such formats can be dropped. And currently these are only two options to get list of mirrors sorted by destination.

Is it ok to just disable/exclude the update repo for tumbleweed?

I'd say it is overall fine to just ignore it completely. At the same time - only provo-mirror.opensuse.org has that repo - you can just hardcode its IP for update
https://mirrorcache-us.opensuse.org/update/tumbleweed/repodata/f4fba37aa004f93b3b9f72917e4621a69964f902184252d9cfce7f49399cb2fe-primary.xml.gz.mirrorlist

Regards,
Andrii Nikitin

#12 Updated by andriinikitin about 1 month ago

It just came to my mind that your approach overally may have a flaw. It is certainly possible that the closest mirror is missing, let's say 10% of recent tumbleweed packages.
Then for some old package - you will get mirrorlist with the closest mirror. But if you configure zypper to use that mirror - you will not get the latest packages until the mirror syncs them. And your update may lag few weeks or even months.

Currently we don't have control over content of mirrors, neither over their rsync schedule. This is why we use mirror redirector (MirrorCache) to redirect each request separately.
But users still free to use some mirror directly, we just cannot guarantee how fresh their update will be.

#13 Updated by pjessen about 1 month ago

dnl028@gmail.com wrote:

True. You want to block ingress traffic from our mirrors?

No. Trying to block other egress, but allow repo egress.

Got it.

Most mirrors don't change addresses very often, so you could just grab every one of them fromhttps://mirrors.opensuse.org and hardcode their IP-addresses in your firewall rules. Or update your rules once a day etc.
Would much rather not hard-code ip addresses into the script, especially
since it would seem any of them might later expire and then be leased to
some other domain, and don't want to accidentally interface with some
random domain/device that is not a repo server. Do you disagree?

Nope, I agree completely. Without knowing the size of your organisation, I think I would be tempted to run a private mirror, thereby being able to restrict outgoing traffic to just one rsync mirror.

Maybe take into account that the some DNS lookups are geoloc sensitive.

Using e.g. dig to resolve a mirror domain, into a mirror ipv4 could be
disrupted by geoloc?

Some of our mirrors have multiple locations, a DNS lookup from a US address and an EU address will give different results.

#14 Updated by dnl028@gmail.com about 1 month ago

Ok.

Possible approach 1:
Leave the local host's repos alone with their normal domain
download.opensuse.org. Then resolve and loop thru all the ip addresses
that could be needed by zypper install, or zypper dist-upgrade, and
create a firewall rule to allow each one.
That way, the local host's repos will still be dynamic and choose best
mirrors without being limited to any certain one, but the firewall rules
will be prepared to allow traffic out to any mirrors ip addresses necessary.

So, if would...
grab every url in e.g.
http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/noarch/accountsservice-lang-22.08.8-2.1.noarch.rpm.mirrorlist?json
resolve each domain into respective ip address
add each said ip address as a firewall allow rule
...would that leave zypper activity functioning fine, or there is more
to it?

With this approach could it happen that...
1) a dist-upgrade failing during download because it cannot access the
target mirror for a certain package.
2) worse, the system is left in an inconsistent/broken state from a mix
of old and new packages?
3) dist-upgrade reaches a point where there is an unresolvable package
conflict because of all this?

Possible approach 2:
Or instead of all the above, maybe this command can be tweaked to get
the url of an rpm.mirrorlist?json file that is sure to be very recent,
and thus have the most current repos in it?...
/usr/bin/zypper --table-style 11 --quiet list-updates --repo 'Main
Repository (OSS)' | /usr/bin/awk 'NR==3 {print
"http://download.opensuse.org/tumbleweed/repo/oss/"$5"/"$2"-"$4"."$5".rpm.mirrorlist?json"}'
Then, resolve the ip address of the domain of the first repo in the
file, and set the local host's repos to said ip address.
Then, add that same ip address as a firewall rule to allow traffic to
it, so that the local host's repos work fine.

Or maybe determine a completely current mirror...
1) by examining the repo metadata?
2) by perhaps one of these zypper options?...
Global options...
--releasever version
For the current command set the value of the $releasever
repository variable to version. This can be used to switch to new
distribution repositories when performing a distribution upgrade. See
the dist-upgrade (dup) command and section Repository
Management for more details about using the $releasever repository variable.
list-patches...
--date YYYY-MM-DD[,...]
List only patches issued up to, but not including, the
specified date.

Would either of these two overall approaches work? Which seems better?

Also...

...this feature can be implemented on server side, so you get IP addresses right away in xml or json by implementing new url parameter like ?hostnames=ipv4 . Currently MirrorCache doesn't track IP addresses of mirrors, but it may be done...
This sounds nice, but currently too unfamiliar with that area of
development.

Thanks

On 8/12/2022 5:51 AM, redmine@opensuse.org wrote:

[openSUSE Tracker]
Issue #115142 has been updated by andriinikitin.

It just came to my mind that your approach overally may have a flaw. It is certainly possible that the closest mirror is missing, let's say 10% of recent tumbleweed packages.
Then for some old package - you will get mirrorlist with the closest mirror. But if you configure zypper to use that mirror - you will not get the latest packages until the mirror syncs them. And your update lag lag few weeks or even months.

Currently we don't have control over content of mirrors, neither over their rsync schedule. This is why we use mirror redirector (MirrorCache) to redirect each request separately.
But users still free to use some mirror directly, we just cannot guarantee how fresh their update will be.


tickets #115142: Auto nearby mirror
https://progress.opensuse.org/issues/115142#change-545318

  • Author: dnl028@gmail.com
  • Status: Feedback
  • Priority: Normal
  • Assignee: andriinikitin
  • Category: Mirrors

* Target version:

Hi,

Just had a mirror related question - it's a long story why, but was
looking to automatically determine the ipv4 address of a nearby mirror,
in a bash script.

Web searching hasn't turned anything up. Are there any common CLI tools
you can recommend to help do this?

Thank you
Dan

#15 Updated by pjessen about 1 month ago

dnl028@gmail.com wrote:

Possible approach 1:
Leave the local host's repos alone with their normal domain
download.opensuse.org. Then resolve and loop thru all the ip addresses
that could be needed by zypper install, or zypper dist-upgrade, and
create a firewall rule to allow each one.
That way, the local host's repos will still be dynamic and choose best
mirrors without being limited to any certain one, but the firewall rules
will be prepared to allow traffic out to any mirrors ip addresses necessary.

Yep, that is what I suggested above:

....  you could just grab every one of them from https://mirrors.opensuse.org and hardcode their IP-addresses in your firewall rules. Or update your rules once a day etc. 

It would just mean relying on the screen-scraping/parsing https://mirrors.opensuse.org (I would double check, the format is subject to change).

#16 Updated by dnl028@gmail.com about 1 month ago

Ok, but as discussed before there seems no reason to hard-code. Wouldn't
only the ip addresses in the region be needed? In which case...

if would...
grab every url in e.g.
http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/noarch/accountsservice-lang-22.08.8-2.1.noarch.rpm.mirrorlist?json

resolve each domain into respective ip address
add each said ip address as a firewall allow rule
...would that leave zypper activity functioning fine, or there is more
to it?

Shouldn't .rpm.mirrorlist?json and .rpm.metalink format be quite stable,
so this can be scripted without much regression?
Moreover, https://mirrors.opensuse.org seems to have some outdated info
as e.g. http://ewr.edge.kernel.org/opensuse seems defunct for some time now.

And what do you think of...

With this approach could it happen that...
1) a dist-upgrade failing during download because it cannot access the
target mirror for a certain package.
2) worse, the system is left in an inconsistent/broken state from a
mix of old and new packages?
3) dist-upgrade reaches a point where there is an unresolvable package
conflict because of all this?

Any thoughts on approach 2, or is approach 1 above, simply better?

On 8/13/2022 6:56 AM, redmine@opensuse.org wrote:

[openSUSE Tracker]
Issue #115142 has been updated by pjessen.

dnl028@gmail.com wrote:

Possible approach 1:
Leave the local host's repos alone with their normal domain
download.opensuse.org. Then resolve and loop thru all the ip addresses
that could be needed by zypper install, or zypper dist-upgrade, and
create a firewall rule to allow each one.
That way, the local host's repos will still be dynamic and choose best
mirrors without being limited to any certain one, but the firewall rules
will be prepared to allow traffic out to any mirrors ip addresses necessary.
Yep, that is what I suggested above:

....  you could just grab every one of them from https://mirrors.opensuse.org and hardcode their IP-addresses in your firewall rules. Or update your rules once a day etc.

It would just mean relying on the screen-scraping/parsing https://mirrors.opensuse.org (I would double check, the format is subject to change).


tickets #115142: Auto nearby mirror
https://progress.opensuse.org/issues/115142#change-545537

  • Author: dnl028@gmail.com
  • Status: Feedback
  • Priority: Normal
  • Assignee: andriinikitin
  • Category: Mirrors

* Target version:

Hi,

Just had a mirror related question - it's a long story why, but was
looking to automatically determine the ipv4 address of a nearby mirror,
in a bash script.

Web searching hasn't turned anything up. Are there any common CLI tools
you can recommend to help do this?

Thank you
Dan

#17 Updated by andriinikitin about 1 month ago

dnl028@gmail.com wrote:

...would that leave zypper activity functioning fine, or there is more
to it?

I don't see how zypper can have any specific problem in this scenario.

Shouldn't .rpm.mirrorlist?json and .rpm.metalink format be quite stable,
so this can be scripted without much regression?

Yes, I don't see why it could change, but I don't have an authority to guarantee that 100%.

Moreover, https://mirrors.opensuse.org seems to have some outdated info
as e.g. http://ewr.edge.kernel.org/opensuse seems defunct for some time now.

Yeah currently report doesn't check if a mirror is online at the moment - it just shows if it had the files at some point.

And what do you think of...

With this approach could it happen that...
1) a dist-upgrade failing during download because it cannot access the
target mirror for a certain package.
2) worse, the system is left in an inconsistent/broken state from a
mix of old and new packages?
3) dist-upgrade reaches a point where there is an unresolvable package
conflict because of all this?

zypper does hard work to make sure OS is in consistent state at any point of time. OS will be stable if downloads fail, etc.

Any thoughts on approach 2, or is approach 1 above, simply better?

Approach 2 sounds simpler, but it doesn't guarantee the closest mirror. I mentioned a scenario where slightly outdated closest mirror may be used to download 90% of packages, and some remote up-to-date mirrors will be used to download the rest. Besides that both approaches should work I guess.

Regards,
Andrii Nikitin

#18 Updated by pjessen about 1 month ago

dnl028@gmail.com wrote:

Ok, but as discussed before there seems no reason to hard-code.

Hmm, maybe I am missing something, but when you can't use a hostname in your firewall setup (iptables, nftables), hard-coding the addresses seems to be the only option.

Wouldn't only the ip addresses in the region be needed?

Depends on how robust you want your solution to be, I would say. What if none of the mirrors in the region have a desired package?

And what do you think of...

With this approach could it happen that...
1) a dist-upgrade failing during download because it cannot access the
target mirror for a certain package.
2) worse, the system is left in an inconsistent/broken state from a
mix of old and new packages?
3) dist-upgrade reaches a point where there is an unresolvable package
conflict because of all this?

Any thoughts on approach 2, or is approach 1 above, simply better?

No thoughts, they are all scenarios you might end up in, no matter what your firewall looks like.

#19 Updated by pjessen about 1 month ago

Moreover, https://mirrors.opensuse.org seems to have some outdated info
as e.g. http://ewr.edge.kernel.org/opensuse seems defunct for some time now.

See #110064

#20 Updated by dnl028@gmail.com about 1 month ago

On 8/15/2022 4:40 AM, redmine@opensuse.org wrote:

[openSUSE Tracker]
Issue #115142 has been updated by andriinikitin.

Shouldn't .rpm.mirrorlist?json and .rpm.metalink format be quite stable,
so this can be scripted without much regression?
Yes, I don't see why it could change, but I don't have an authority to guarantee that 100%.
Fair enough.

Moreover, https://mirrors.opensuse.org seems to have some outdated info
as e.g. http://ewr.edge.kernel.org/opensuse seems defunct for some time now.
Yeah currently report doesn't check if a mirror is online at the moment - it just shows if it had the files at some point.
Ok
Approach 2 sounds simpler, but it doesn't guarantee the closest mirror. I mentioned a scenario where slightly outdated closest mirror may be used to download 90% of packages, and some remote up-to-date mirrors will be used to download the rest. Besides that both approaches should work I guess.
Was just looking to pick the top mirror, to have a close mirror, but it
does not need to be the closest. The main idea behind approach 2, is to
pick a fully current mirror by any of the listed suggestions in the
approach's explanation - if none of those suggestions can confidently
provide a single mirror that'll have all current packages, then approach
1 should be favored instead of approach 2. It seems like both of you
somewhat favor approach 1 anyway, so unless at least one of you advise
approach 2, then it sounds like approach 1 will be implemented.


tickets #115142: Auto nearby mirror
https://progress.opensuse.org/issues/115142#change-545636

  • Author: dnl028@gmail.com
  • Status: Feedback
  • Priority: Normal
  • Assignee: andriinikitin
  • Category: Mirrors

* Target version:

Hi,

Just had a mirror related question - it's a long story why, but was
looking to automatically determine the ipv4 address of a nearby mirror,
in a bash script.

Web searching hasn't turned anything up. Are there any common CLI tools
you can recommend to help do this?

Thank you
Dan

#21 Updated by dnl028@gmail.com about 1 month ago

See #110064
Yep, got it.

More below...

On 8/15/2022 4:55 AM, redmine@opensuse.org wrote:

[openSUSE Tracker]
Issue #115142 has been updated by pjessen.

Ok, but as discussed before there seems no reason to hard-code.
Hmm, maybe I am missing something, but when you can't use a hostname in your firewall setup (iptables, nftables), hard-coding the addresses seems to be the only option.
If by 'hard-code', you mean add a bunch ip addresses as firewall rules
and leave it, then that's no good. But if you mean, have the script
automatically apply the firewall rules with the ip addresses, then ya,
that exactly the plan - it's just a matter of how many and which ip
addresses will be added as rules.

Wouldn't only the ip addresses in the region be needed?
Depends on how robust you want your solution to be, I would say. What if none of the mirrors in the region have a desired package?
Ok. Wouldn't allowing access to all the mirrors listed in e.g.
http://mirrorcache-us.opensuse.org/tumbleweed/repo/oss/x86_64/accountsservice-22.08.8-2.1.x86_64.rpm.mirrorlist?json
, reliably supply every current needed package?
If not, how would you advise expanding this to include enough mirrors?

Anyway, It seems that as described in approach 1, if the host repos use
download.opensuse.org/... and only certain mirror ip addresses are
allowed out by the firewall, and if none of these allowed mirrors have
the needed version of a certain package, then the dowload will fail and
zypper will quit before performing an inconsistent package installation,
especially if applying option --download-in-advance. Then, manual
intervention can be done, so this does not seem like a danger. And if it
only happens on infrequent occasion, then it's not much an inconvenience
either.

And what do you think of...

With this approach could it happen that...
1) a dist-upgrade failing during download because it cannot access the
target mirror for a certain package.
2) worse, the system is left in an inconsistent/broken state from a
mix of old and new packages?
3) dist-upgrade reaches a point where there is an unresolvable package
conflict because of all this?
Any thoughts on approach 2, or is approach 1 above, simply better?
No thoughts, they are all scenarios you might end up in, no matter what your firewall looks like.
Ok. For the numbered items above, which were intended to pertain to
solution 1, it sounds like you're saying these are possibilities
regardless of this whole subject/project, and that modding the host
repos and firewall rules as described, don't particularly seem to
increase the risk.


tickets #115142: Auto nearby mirror
https://progress.opensuse.org/issues/115142#change-545645

  • Author: dnl028@gmail.com
  • Status: Feedback
  • Priority: Normal
  • Assignee: andriinikitin
  • Category: Mirrors

* Target version:

Hi,

Just had a mirror related question - it's a long story why, but was
looking to automatically determine the ipv4 address of a nearby mirror,
in a bash script.

Web searching hasn't turned anything up. Are there any common CLI tools
you can recommend to help do this?

Thank you
Dan

Also available in: Atom PDF