tickets #104844
closedVery slow network connectivity with api.opensuse.org
Added by srinidhi almost 3 years ago. Updated over 1 year ago.
100%
Description
Hello Admin Team,
Since the 5th or 6th of January, we have been facing extremely poor network
throughput when using the OBS Interconnect on our private OBS instance. We have
configured OBS interconnect for both openSUSE.org (api.opensuse.org) and SEBS
(api.suse.com) instance. The maximum speed that we get is approximately 50 kbps!
I would like to present some data before continuing further. This is what we get
on our backend server:
# curl 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14.5M 0 14.5M 0 0 33723 0 --:--:-- 0:07:31 --:--:-- 97338
7 minutes 31 seconds to download 14.5MB of data.
The same situation is with api.suse.com as well:
# curl 'https://srinidhi:XXXX@api.suse.com/build/SUSE:SLE-12-SP0:Update/standard/x86_64/_repository?view=cache' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12.1M 0 12.1M 0 0 30295 0 --:--:-- 0:07:01 --:--:-- 40555
Similar average download speed for both the servers.
Because of this slow network throughput, our private OBS instance is completely
unresponsive for most part of the day. This is causing a huge loss of
productivity to our teams and the product.
We have reason to believe that our external IPv4 address (192.31.114.0/24) is
being rate limited at {open,}SUSE side. There are two main reasons for this:
We are able to access api.opensuse.org at high speed
- over IPv6
- from Provo (US) and other geographies
Could you please check this on your end and help us resolve this issue? Please
let us know if you need any additional information from us.
Regards,
Srinidhi.
Files
api-opensuse-org.dump.xz (15 MB) api-opensuse-org.dump.xz | tcpdump -w api-opensuse-org.dump -ttt host 195.135.221.162 | srinidhi, 2022-01-15 08:59 |
Updated by cboltz almost 3 years ago
- Category set to OBS
- Assignee set to opensuse-admin-obs
Updated by lrupp almost 3 years ago
- Assignee changed from opensuse-admin-obs to mdruvietis
Updated by lrupp almost 3 years ago
- Status changed from New to Workable
Opened a ticket with the SUSE network team. Copy'n'pasted your message from here into the ticket. Let's wait for their response.
Updated by mdruvietis almost 3 years ago
Hi,
I can confirm that we do not have any rate-limit in place on SUSE side. Also ISP utilization is way below 50% for past weeks without any bursts, so doubt issue is here.
Could I get a traceroute from affected subnet 192.31.114.0/24 and also from ipv6 subnet of same location that works fine?
Updated by srinidhi almost 3 years ago
mdruvietis wrote:
Hi,
I can confirm that we do not have any rate-limit in place on SUSE side. Also ISP utilization is way below 50% for past weeks without any bursts, so doubt issue is here.
Thank you for confirming this!
Could I get a traceroute from affected subnet 192.31.114.0/24 and also from ipv6 subnet of same location that works fine?
I can definitely share the IPv4 details. I do not have the IPv6 setup with me anymore and I would need sometime to contact our network team to get me that data.
For openSUSE.org¶
IPv4 traceroute to api.opensuse.org:
# traceroute -4 api.opensuse.org
traceroute to api.opensuse.org (195.135.221.162), 30 hops max, 60 byte packets
1 firewall.blr.build.net (192.168.10.1) 0.362 ms 0.318 ms 0.300 ms
2 * * *
3 blr-asa-2.gns.attachmategroup.com (164.99.203.246) 0.765 ms 0.772 ms 0.727 ms
4 192.31.114.253 (192.31.114.253) 0.906 ms 0.869 ms 0.923 ms
5 59.145.123.141 (59.145.123.141) 3.420 ms 3.303 ms 3.386 ms
6 116.119.106.116 (116.119.106.116) 151.297 ms 151.421 ms 151.369 ms
7 ae7-0.lon10.core-backbone.com (80.255.15.229) 298.856 ms 294.292 ms *
8 ae2-2001.nbg30.core-backbone.com (81.95.15.161) 279.612 ms 274.980 ms 274.771 ms
9 * core-backbone.microfocus.com (5.56.18.210) 273.133 ms 273.164 ms
10 195.135.221.26 (195.135.221.26) 279.693 ms 279.185 ms 279.290 ms
11 195.135.221.162 (195.135.221.162) 270.648 ms !X 271.377 ms !X 271.498 ms !X
Here is the output of mtr api.opensuse.org -4 -r 20
:
# mtr api.opensuse.org -4 -r 20
Start: 2022-01-12T21:13:31+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.2 0.2 0.2 0.3 0.0
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- blr-asa-2.gns.attachmateg 0.0% 10 0.9 0.8 0.7 1.0 0.1
4.|-- 192.31.114.253 0.0% 10 1.2 1.0 0.8 1.2 0.1
5.|-- 59.145.123.141 0.0% 10 2.7 3.1 2.7 4.0 0.4
6.|-- 182.79.206.46 0.0% 10 146.7 148.9 146.4 169.1 7.1
7.|-- ae7-0.lon10.core-backbone 20.0% 10 268.3 284.9 268.3 297.2 8.5
8.|-- ae2-2001.nbg30.core-backb 20.0% 10 277.4 274.8 271.5 277.4 1.9
9.|-- core-backbone.microfocus. 20.0% 10 273.6 273.1 270.3 274.4 1.3
10.|-- 195.135.221.26 0.0% 10 279.2 279.0 277.4 282.1 1.3
11.|-- 195.135.221.162 10.0% 10 267.8 269.7 267.8 271.1 1.1
mtr: udp socket connect failed: Invalid argument
For api.suse.com¶
IPv4 traceroute to api.suse.com:
# traceroute -4 api.suse.com
traceroute to api.suse.com (195.135.221.169), 30 hops max, 60 byte packets
1 firewall.blr.build.net (192.168.10.1) 0.273 ms 0.245 ms 0.227 ms
2 * * *
3 blr-asa-2.gns.attachmategroup.com (164.99.203.246) 0.750 ms 0.754 ms 0.750 ms
4 192.31.114.253 (192.31.114.253) 0.952 ms 1.069 ms 1.060 ms
5 59.145.123.141 (59.145.123.141) 3.984 ms 4.006 ms 3.959 ms
6 116.119.106.116 (116.119.106.116) 151.410 ms 116.119.44.158 (116.119.44.158) 146.261 ms 116.119.106.116 (116.119.106.116) 156.200 ms
7 ae7-0.lon10.core-backbone.com (80.255.15.229) 291.323 ms 291.348 ms 291.144 ms
8 ae2-2001.nbg30.core-backbone.com (81.95.15.161) 274.687 ms 275.331 ms 274.529 ms
9 core-backbone.microfocus.com (5.56.18.210) 273.338 ms 274.345 ms 274.666 ms
10 195.135.221.26 (195.135.221.26) 279.670 ms 279.878 ms 279.291 ms
11 195.135.221.169 (195.135.221.169) 279.846 ms !X 277.696 ms !X 277.800 ms !X
Here is the output of mtr api.suse.com -4 -r 20
:
# mtr api.suse.com -4 -r 20
Start: 2022-01-12T21:19:46+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.2 0.2 0.2 0.4 0.1
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- blr-asa-2.gns.attachmateg 0.0% 10 0.7 0.7 0.7 0.9 0.1
4.|-- 192.31.114.253 0.0% 10 1.1 1.0 0.9 1.2 0.1
5.|-- 59.145.123.141 0.0% 10 2.7 4.1 2.7 13.9 3.4
6.|-- 116.119.106.116 0.0% 10 151.5 154.6 151.4 175.3 7.5
7.|-- ae7-0.lon10.core-backbone 10.0% 10 266.1 270.3 266.1 275.5 3.2
8.|-- ae2-2001.nbg30.core-backb 10.0% 10 275.8 275.1 272.6 276.0 1.1
9.|-- core-backbone.microfocus. 10.0% 10 274.6 274.6 272.6 277.9 1.4
10.|-- 195.135.221.26 20.0% 10 279.4 279.3 276.1 281.2 1.7
11.|-- 195.135.221.169 0.0% 10 279.7 279.6 279.1 280.0 0.3
mtr: udp socket connect failed: Invalid argument
Hope this helps,
Srinidhi.
Updated by mdruvietis almost 3 years ago
Hi srinidhi,
thanks for your input. It does seem that routing is fine and nothing obvious comes up. I did some tests from various locations myself and could never replicate this slowness, always getting much higher speeds. It seems bit strange issue and might be tricky to nail this one down.
Could you please tell if:
slowness is constant and doesn't go above 50kbps since 5th/6th of Jan ?
is it the same from multiple machines?
is only 192.31.114.0/24 subnet affected as source?
are other downloads for same machines unaffected?
Could you try another tool to test download speed? Like
wget 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache' > /dev/null
or something similar.
as also, would it be possible to get tcpdump for full download while speeds are low? It is last effort, just to check how this tcp session looks like.
Thanks,
Updated by srinidhi almost 3 years ago
Hello Mikelis,
Apologies for the delay in responding! I've been trying to post a reply here for a while, but I have not been able to do that. (I have no idea why I was being shown the "loading" spinner forever)
mdruvietis wrote:
Hi srinidhi,
thanks for your input. It does seem that routing is fine and nothing obvious comes up. I did some tests from various locations myself and could never replicate this slowness, always getting much higher speeds. It seems bit strange issue and might be tricky to nail this one down.
Could you please tell if:
slowness is constant and doesn't go above 50kbps since 5th/6th of Jan ?
The slowness is seen only during the daytime in India (+0530 GMT). After 1:00 AM in the night, the network speed is restored as expected. Here is the output of mtr
command that was taken at 1:30 AM today:
# mtr api.opensuse.org -4 -r 20
Start: 2022-01-13T01:27:44+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.1 0.2 0.1 0.3 0.0
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- blr-asa-2.gns.attachmateg 0.0% 10 0.7 0.7 0.6 0.8 0.0
4.|-- 192.31.114.253 0.0% 10 1.0 1.0 0.9 1.0 0.0
5.|-- 59.145.123.141 0.0% 10 2.7 2.8 2.6 3.4 0.2
6.|-- 182.79.206.46 0.0% 10 149.2 149.1 149.0 149.3 0.1
7.|-- ae7-0.lon10.core-backbone 0.0% 10 241.2 251.0 241.2 259.6 5.9
8.|-- ae2-2001.nbg30.core-backb 10.0% 10 260.9 267.9 260.9 275.7 5.6
9.|-- core-backbone.microfocus. 0.0% 10 264.5 265.8 256.5 275.9 7.3
10.|-- 195.135.221.26 0.0% 10 271.7 274.6 262.2 281.6 6.6
11.|-- 195.135.221.162 0.0% 10 266.4 264.6 253.1 270.5 6.4
mtr: udp socket connect failed: Invalid argument
The output of the traceroute
command at the same time:
# traceroute -4 api.opensuse.org
traceroute to api.opensuse.org (195.135.221.162), 30 hops max, 60 byte packets
1 firewall.blr.build.net (192.168.10.1) 0.458 ms 0.395 ms 0.376 ms
2 * * *
3 blr-asa-2.gns.attachmategroup.com (164.99.203.246) 0.953 ms 0.940 ms 0.918 ms
4 192.31.114.253 (192.31.114.253) 0.904 ms 0.999 ms 0.985 ms
5 59.145.123.141 (59.145.123.141) 2.825 ms 2.820 ms 2.869 ms
6 182.79.206.46 (182.79.206.46) 146.176 ms 116.119.44.158 (116.119.44.158) 145.980 ms 116.119.106.116 (116.119.106.116) 156.154 ms
7 ae7-0.lon10.core-backbone.com (80.255.15.229) 251.013 ms 254.917 ms 251.059 ms
8 ae2-2001.nbg30.core-backbone.com (81.95.15.161) 269.663 ms 267.916 ms 267.095 ms
9 core-backbone.microfocus.com (5.56.18.210) 261.192 ms 261.864 ms 261.317 ms
10 195.135.221.26 (195.135.221.26) 271.263 ms 282.167 ms 277.410 ms
11 195.135.221.162 (195.135.221.162) 272.570 ms !X 265.867 ms !X 266.148 ms !X
The output of the curl
command at the same time:
# curl 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14.5M 0 14.5M 0 0 166k 0 --:--:-- 0:01:29 --:--:-- 113k
is it the same from multiple machines?
Yes. From multiple machines that have the same external IP address.
is only 192.31.114.0/24 subnet affected as source?
As far as I'm aware, yes.
are other downloads for same machines unaffected?
Yes. I tried downloading files from rsync.opensuse.org and other sites. Only data transfer between Micro Focus Bangalore network and SUSE network is affected.
$ curl 'https://rsync.opensuse.org/distribution/leap/15.3/repo/oss/ChangeLog' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 91.7M 100 91.7M 0 0 1341k 0 0:01:09 0:01:09 --:--:-- 1168k
Could you try another tool to test download speed? Like
wget 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache' > /dev/null
or something similar.
# wget 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache' > /dev/null
--2022-01-13 21:02:52-- https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache
Resolving api.opensuse.org (api.opensuse.org)... 195.135.221.162, 2001:67c:2178:8::162 Connecting to api.opensuse.org (api.opensuse.org)|195.135.221.162|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-cpio]
Saving to: ‘_repository?view=cache’
[ <=> ] 15,231,300 25.6KB/s in 11m 18s
2022-01-13 21:14:11 (21.9 KB/s) - ‘_repository?view=cache’ saved [15231300]
as also, would it be possible to get tcpdump for full download while speeds are low? It is last effort, just to check how this tcp session looks like.
If you don't mind, could you please provide the full tcpdump
command that will help you? Is it enough to run tcpdump -w out.dump -ttt host 195.135.221.162
?
While looking at the output of mtr
we observed that there is a delay between hop #6 and hop #7:
# mtr api.suse.com -4 -r 20
Start: 2022-01-13T14:06:37+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.2 0.2 0.1 0.2 0.0
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- blr-asa-1.gns.attachmateg 0.0% 10 0.7 0.7 0.6 0.7 0.0
4.|-- 192.31.114.253 0.0% 10 0.8 0.9 0.8 1.0 0.1
5.|-- 59.145.123.141 0.0% 10 2.6 3.3 2.6 6.4 1.2
6.|-- 116.119.106.116 0.0% 10 158.7 165.8 158.6 200.5 13.3
7.|-- ae7-0.lon10.core-backbone 10.0% 10 343.2 334.8 317.0 347.1 10.3
8.|-- ae2-2001.nbg30.core-backb 0.0% 10 283.2 286.3 282.7 290.3 3.2
9.|-- core-backbone.microfocus. 10.0% 10 280.3 282.4 280.3 285.4 1.6
10.|-- 195.135.221.26 10.0% 10 285.7 286.3 283.6 288.0 1.2
11.|-- 195.135.221.169 10.0% 10 284.4 286.9 284.4 289.7 1.5
Hop 6 is the IP address of our Bangalore ISP (Airtel) and hop 7 is of SUSE ISP (Core-Backbone). We have requested our IT team to report this issue to our ISP (Airtel). Could you please also check with Core-Backbone if this is something they could rectify?
Thanks,
Regards,
Srinidhi.
Updated by mdruvietis almost 3 years ago
Hi Srinidhi,
Thanks for your input. I have opened case with CoreBB to take a look on this edge. This does seem to me like ISP issue, specially if this occurs during your daytime, when biggest load is happening, but not during nights. Our side is never overloaded and no other customer have reported anything similar.
I'm still curious to see tcpdump, even I have very little hope to see anything besides tcp struggling with its window sizes. Command you provided should work, I'm only curious about affected session.
I'm afraid there is not much we can do right now, besides waiting on reply of our ISP. I will update here as soon I get any news from CoreBB.
Thanks,
Updated by srinidhi almost 3 years ago
- File api-opensuse-org.dump.xz api-opensuse-org.dump.xz added
Hi,
mdruvietis wrote:
Hi Srinidhi,
Thanks for your input. I have opened case with CoreBB to take a look on this edge. This does seem to me like ISP issue, specially if this occurs during your daytime, when biggest load is happening, but not during nights. Our side is never overloaded and no other customer have reported anything similar.
Thank you so much for opening the ticket with CoreBB on this issue! I really appreciate it!
I'm still curious to see tcpdump, even I have very little hope to see anything besides tcp struggling with its window sizes. Command you provided should work, I'm only curious about affected session.
I was thinking that today being Saturday, the link will be fast and I will have to wait till Monday to reproduce the issue. But turns out I was wrong. It is worse than weekdays today!
$ curl 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14.5M 0 14.5M 0 0 22735 0 --:--:-- 0:11:09 --:--:-- 22776
A full 11 minutes! At one point, I was getting less than 6kbps!
Please find attached the output of tcpdump -w api-opensuse-org.dump -ttt host 195.135.221.162
of the above curl
session.
I'm afraid there is not much we can do right now, besides waiting on reply of our ISP. I will update here as soon I get any news from CoreBB.
Thanks,
I really appreciate your support and patience here! We have been facing this problem for the last 10 days and we have, at least, found the root cause of the issue.
Regards,
Srinidhi.
Updated by mdruvietis almost 3 years ago
Hi Srinidhi,
I got an update from CoreBB that they have changed routing since this morning. Could you please test/verify if slowness is still present?
Thanks a lot,
Updated by srinidhi almost 3 years ago
Hi,
I want to report that since today morning, Indian Standard Time, the issue seems to have been resolved!
# curl 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14.5M 0 14.5M 0 0 2047k 0 --:--:-- 0:00:07 --:--:-- 2961k
It looks like CoreBB and Airtel have changed some routing. The mtr
output now shows:
# mtr api.opensuse.org -4 -r 20
Start: 2022-01-17T12:44:08+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.2 0.3 0.2 0.4 0.1
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- blr-asa-1.gns.attachmateg 0.0% 10 1.0 0.9 0.8 1.0 0.1
4.|-- 192.31.114.253 0.0% 10 1.2 1.2 0.9 1.3 0.1
5.|-- 59.145.123.141 0.0% 10 2.9 6.0 2.9 33.0 9.5
6.|-- 182.79.206.46 0.0% 10 147.7 147.8 147.4 149.3 0.6
7.|-- ae7-0.lon10.core-backbone 0.0% 10 238.8 248.2 238.5 256.2 6.4
8.|-- ae2-2001.nbg30.core-backb 0.0% 10 257.5 266.0 256.1 273.5 6.4
9.|-- core-backbone.microfocus. 0.0% 10 268.3 262.7 255.0 269.9 5.3
10.|-- 195.135.221.26 0.0% 10 268.1 269.0 260.3 279.8 6.9
11.|-- 195.135.221.162 0.0% 10 262.0 260.5 252.3 268.5 6.2
mtr: udp socket connect failed: Invalid argument
Thank you so much for your support and patience on this matter!
Regards,
Srinidhi.
Updated by srinidhi almost 3 years ago
Hello Mikelis,
mdruvietis wrote:
Hi Srinidhi,
I got an update from CoreBB that they have changed routing since this morning. Could you please test/verify if slowness is still present?
I was slightly delayed in posting my update. Just now saw your update after posting my comment.
The issue is resolved now.
Regards,
Srinidhi.
Updated by mdruvietis almost 3 years ago
Hi Srinidhi,
I'm glad that it got resolved. It's always tricky when issue is out of our control.
And thanks for your always perfect input, it helped a lot to identify issue much faster.
let us know if issue ever comes back,
Thanks,
Updated by mdruvietis almost 3 years ago
- Status changed from Workable to Resolved
- % Done changed from 0 to 100
Resolving case, thanks.
Updated by srinidhi over 2 years ago
Hi,
The problem seems to have returned (exactly a month later!) with similar problems between the ISPs Airtel and CoreBB:
# mtr api.opensuse.org -4 -r 20
Start: 2022-02-04T15:11:24+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.3 0.2 0.2 0.3 0.1
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- blr-asa-1.gns.attachmateg 0.0% 10 0.7 0.7 0.6 0.8 0.1
4.|-- 192.31.114.253 0.0% 10 0.9 1.0 0.9 1.1 0.0
5.|-- 59.145.123.141 0.0% 10 2.8 3.0 2.8 3.1 0.1
6.|-- 182.79.206.46 0.0% 10 149.3 149.8 149.2 153.3 1.2
7.|-- ae7-0.lon10.core-backbone 20.0% 10 336.0 335.8 335.6 336.0 0.1
8.|-- ae2-2001.nbg30.core-backb 10.0% 10 344.2 344.3 343.9 345.0 0.3
9.|-- core-backbone.microfocus. 0.0% 10 342.7 342.7 342.4 343.0 0.2
10.|-- 195.135.221.26 0.0% 10 346.5 347.4 346.4 348.7 1.0
11.|-- 195.135.221.162 0.0% 10 342.3 342.4 342.2 342.7 0.2
mtr: udp socket connect failed: Invalid argument
The download speeds are worse than what reported earlier in this ticket:
# curl 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/x86_64/_repository?view=cache' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14.5M 0 14.5M 0 0 19150 0 --:--:-- 0:13:16 --:--:-- 12770
We need your help once again! Our private OBS instance is once again completely unresponsive because of this slowness.
Regards,
Srinidhi.
Updated by srinidhi over 2 years ago
Hi,
Any updates? The reasons I'm following up on the same day is because:
- our entire working day was spent on waiting for api.opensuse.org to respond to queries, and,
- today being Friday, we would like to avoid waiting until mid of Monday morning for any sort of update or resolution.
I hope you will understand.
Regards,
Srinidhi.