tickets #113021
closedVery slow network connectivity with api.opensuse.org
0%
Description
Hello Team,
This ticket is similar to a previous ticket - https://progress.opensuse.org/issues/104844 - that I had raised earlier this year. The same issue has been observed since this morning.
Here are the output for the traceroute
and the mtr
commands:
# traceroute api.opensuse.org
traceroute to api.opensuse.org (195.135.221.162), 30 hops max, 60 byte packets
1 firewall.blr.build.net (192.168.10.1) 0.213 ms 0.171 ms 0.152 ms
2 * * *
3 blr-asa-2.gns.attachmategroup.com (164.99.203.246) 0.706 ms 0.778 ms 0.733 ms
4 192.31.114.253 (192.31.114.253) 0.883 ms 0.876 ms 0.900 ms
5 59.145.123.141 (59.145.123.141) 2.633 ms 2.649 ms 2.595 ms
6 116.119.68.57 (116.119.68.57) 139.240 ms 116.119.106.116 (116.119.106.116) 148.847 ms 182.79.206.46 (182.79.206.46) 141.584 ms
7 ae7-0.lon10.core-backbone.com (80.255.15.229) 331.742 ms 333.591 ms 332.362 ms
8 ae2-2001.nbg30.core-backbone.com (81.95.15.161) 344.253 ms 335.322 ms 341.735 ms
9 core-backbone.microfocus.com (5.56.18.210) 341.424 ms 341.404 ms 343.420 ms
10 195.135.221.26 (195.135.221.26) 345.098 ms 338.877 ms 338.660 ms
11 195.135.221.162 (195.135.221.162) 340.077 ms !X 334.949 ms !X 335.823 ms !X
# mtr api.opensuse.org -4 -r 20
Start: 2022-06-24T16:43:38+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.2 0.3 0.2 0.3 0.0
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- blr-asa-2.gns.attachmateg 0.0% 10 0.9 1.0 0.9 1.1 0.1
4.|-- 192.31.114.253 0.0% 10 1.1 1.1 1.0 1.3 0.1
5.|-- 59.145.123.141 0.0% 10 3.0 2.9 2.7 3.1 0.1
6.|-- 116.119.68.57 0.0% 10 139.1 141.1 139.1 156.5 5.4
7.|-- ae7-0.lon10.core-backbone 10.0% 10 326.2 325.9 325.8 326.2 0.1
8.|-- ae2-2001.nbg30.core-backb 20.0% 10 335.8 335.9 335.7 336.1 0.1
9.|-- core-backbone.microfocus. 20.0% 10 336.3 334.6 334.2 336.3 0.7
10.|-- 195.135.221.26 20.0% 10 339.1 339.9 338.6 342.8 1.7
11.|-- 195.135.221.162 10.0% 10 334.1 334.2 333.9 334.6 0.2
mtr: udp socket connect failed: Invalid argument
It looks like the routing problem with Core BackBone switches is recurring. Would you be so kind to report this issue to Core BackBone to correct their routing table?
Regards,
Srinidhi.
Files
Updated by cboltz over 2 years ago
- Category set to Core services and virtual infrastructure
- Assignee set to SUSE-Admins
Updated by srinidhi almost 2 years ago
crameleon wrote:
Hi,
Is this still an issue?
No. This is no longer an issue and can be closed. I don't know how to close tickets myself.
Regards,
Srinidhi.
Updated by crameleon almost 2 years ago
- Status changed from New to Closed
Alright! Thanks for getting back.
Updated by srinidhi over 1 year ago
crameleon wrote:
Alright! Thanks for getting back.
Hello @crameleon,
This issue is back and we are very badly impacted:
# mtr api.opensuse.org -4 -r 20
Start: 2023-04-06T12:08:48+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.3 0.3 0.2 0.3 0.0
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- 164.99.203.246 0.0% 10 0.9 0.9 0.7 1.1 0.1
4.|-- 192.31.114.253 0.0% 10 1.4 2.9 1.1 18.0 5.3
5.|-- 32.114.83.145 0.0% 10 9.9 10.1 9.9 10.4 0.2
6.|-- 32.109.34.101 0.0% 10 8.9 11.0 8.9 12.6 1.3
7.|-- 165.87.139.85 0.0% 10 12.6 11.2 9.1 12.6 1.1
8.|-- 32.119.109.109 0.0% 10 12.0 11.5 9.9 12.9 0.8
9.|-- 165.87.76.34 0.0% 10 8.4 8.5 8.3 8.7 0.1
10.|-- 32.109.50.94 0.0% 10 128.3 128.3 128.1 128.5 0.1
11.|-- ae10-1.fra30.core-backbon 0.0% 10 128.1 128.2 128.0 128.4 0.1
12.|-- ae2-2001.nbg30.core-backb 0.0% 10 131.3 131.3 131.1 131.4 0.1
13.|-- core-backbone.microfocus. 0.0% 10 131.1 131.2 131.1 131.3 0.1
14.|-- 195.135.221.26 0.0% 10 131.3 132.1 131.2 137.6 2.0
15.|-- 195.135.221.162 0.0% 10 131.5 131.5 131.5 131.7 0.1
mtr: udp socket connect failed: Invalid argument
# curl 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/i586?view=binaryversionscode' > /dev/null % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 45.8M 100 45.8M 0 0 47286 0 0:16:56 0:16:56 --:--:-- 44674
Looks like the problem is once again with the core-backbone servers. Could you please help us out here?
Regards,
Srinidhi.
Updated by crameleon over 1 year ago
Hi,
this is a tough one. The network is managed by SUSE, but there are currently no openSUSE volunteers in the SUSE network team. I will try to forward it and reference this and your last ticket.
Cheers,
Georg
Updated by crameleon over 1 year ago
Internal ticket: https://sd.suse.com/browse/SD-117744.
Updated by crameleon over 1 year ago
Hi,
I received feedback:
- your last MTR output looks normal
- can you provide MTR and tracepath output with both IPv4 and IPv6 to further investigate the issue?
Updated by srinidhi over 1 year ago
crameleon wrote:
Hi,
I received feedback:
- your last MTR output looks normal
Oh, okay! That is interesting!
- can you provide MTR and tracepath output with both IPv4 and IPv6 to further investigate the issue?
Unfortunately, I can only provide data for IPv4 because we do not have an IPv6 address assigned to our Build Service backend.
MTR¶
# mtr api.opensuse.org -4 -r 20
Start: 2023-04-06T20:53:33+0530
HOST: blrcooker Loss% Snt Last Avg Best Wrst StDev
1.|-- firewall.blr.build.net 0.0% 10 0.2 0.2 0.2 0.4 0.1
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- 164.99.203.246 0.0% 10 1.0 1.0 0.7 1.2 0.1
4.|-- 192.31.114.253 0.0% 10 1.3 1.2 1.0 1.4 0.1
5.|-- 32.114.83.145 0.0% 10 10.3 10.1 9.9 10.3 0.1
6.|-- 32.109.34.101 0.0% 10 9.9 10.9 9.1 12.2 1.0
7.|-- 165.87.139.85 0.0% 10 12.7 11.2 9.3 12.7 1.1
8.|-- 32.119.109.109 10.0% 10 10.2 11.4 10.2 12.3 0.7
9.|-- 165.87.76.34 0.0% 10 8.5 8.5 8.3 8.6 0.1
10.|-- 32.109.50.94 0.0% 10 128.1 128.3 128.1 128.5 0.1
11.|-- ae10-1.fra30.core-backbon 0.0% 10 128.1 128.3 128.1 128.5 0.1
12.|-- ae2-2001.nbg30.core-backb 0.0% 10 131.3 131.3 131.1 131.4 0.1
13.|-- core-backbone.microfocus. 0.0% 10 132.4 131.3 131.1 132.4 0.4
14.|-- 195.135.221.26 0.0% 10 131.6 131.4 131.3 131.6 0.1
15.|-- 195.135.221.162 0.0% 10 131.6 131.6 131.4 131.7 0.1
mtr: udp socket connect failed: Invalid argument
Traceroute¶
# traceroute api.opensuse.org
traceroute to api.opensuse.org (195.135.221.162), 30 hops max, 60 byte packets
1 firewall.blr.build.net (192.168.10.1) 0.190 ms 0.264 ms 0.231 ms
2 * * *
3 164.99.203.246 (164.99.203.246) 0.819 ms 0.871 ms 0.756 ms
4 192.31.114.253 (192.31.114.253) 1.030 ms 1.017 ms 0.965 ms
5 32.114.83.145 (32.114.83.145) 9.714 ms 9.708 ms 9.695 ms
6 32.109.34.101 (32.109.34.101) 11.806 ms 11.739 ms 11.709 ms
7 165.87.139.85 (165.87.139.85) 8.683 ms 12.098 ms 12.104 ms
8 32.119.109.109 (32.119.109.109) 30.684 ms 30.423 ms 30.435 ms
9 165.87.76.34 (165.87.76.34) 8.325 ms 8.662 ms 8.308 ms
10 32.109.50.94 (32.109.50.94) 128.300 ms 128.308 ms 128.294 ms
11 ae3-1.fra20.core-backbone.com (80.81.192.187) 128.653 ms 128.631 ms 128.584 ms
12 ae2-2001.nbg30.core-backbone.com (81.95.15.161) 131.367 ms 131.360 ms 131.262 ms
13 core-backbone.microfocus.com (5.56.18.210) 131.280 ms 131.245 ms 131.239 ms
14 195.135.221.26 (195.135.221.26) 131.349 ms 131.728 ms 131.675 ms
15 195.135.221.162 (195.135.221.162) 131.549 ms !X 131.858 ms !X 131.861 ms !X
Updated by crameleon over 1 year ago
Thanks for the quick reply.
Could you provide tracepath
output as well, to check for possible MTU problems?
Updated by srinidhi over 1 year ago
crameleon wrote:
Thanks for the quick reply.
I'm desperately refreshing this page every 2-5 minutes :-) as our Private Build Service instance has been almost unusable for the entire working day today! :-(
Could you provide
tracepath
output as well, to check for possible MTU problems?
# tracepath api.opensuse.org
1?: [LOCALHOST] pmtu 1500
1: firewall.blr.build.net 0.183ms
1: firewall.blr.build.net 0.109ms
2: no reply
3: 164.99.203.246 7.238ms
4: 192.31.114.253 1.028ms
5: 32.114.83.145 16.434ms
6: 32.109.34.101 15.548ms asymm 9
7: 165.87.139.85 18.080ms asymm 8
8: 32.119.109.109 18.462ms
9: 165.87.76.34 14.828ms asymm 8
10: 32.109.50.94 134.602ms asymm 11
11: ae3-1.fra20.core-backbone.com 134.942ms asymm 12
12: ae2-2001.nbg30.core-backbone.com 137.651ms asymm 14
13: core-backbone.microfocus.com 137.952ms asymm 14
14: 195.135.221.26 142.212ms asymm 15
15: 195.135.221.162 141.957ms !H
Resume: pmtu 1500
Regards,
Srinidhi.
Updated by crameleon over 1 year ago
- Assignee changed from SUSE-Admins to opensuse-admin-obs
Thanks, that unfortunately doesn't show an issue either. My network colleague suggests it is likely an issue on the destination server as he receives flaky behavior when downloading from api.opensuse.org as well and suggests one should capture tcpdumps from both the source and destination. Since the destination seems to not be managed by regular openSUSE Heroes but by the team operating the Build Service, I will assign the ticket to them. I hope they can help you further. Keep in mind that many are from Germany and are likely to be enjoying their easter holidays already. Sorry to not have any better news for now.
Cheers
Georg
Updated by srinidhi over 1 year ago
Hi,
crameleon wrote:
Thanks, that unfortunately doesn't show an issue either. My network colleague suggests it is likely an issue on the destination server as he receives flaky behavior when downloading from api.opensuse.org as well and suggests one should capture tcpdumps from both the source and destination. Since the destination seems to not be managed by regular openSUSE Heroes but by the team operating the Build Service, I will assign the ticket to them. I hope they can help you further. Keep in mind that many are from Germany and are likely to be enjoying their easter holidays already. Sorry to not have any better news for now.
Any updates for me, please?
The network throughput has worsened since Sunday!
# curl -4 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/i586?view=binaryversionscode' > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 45.9M 100 45.9M 0 0 35117 0 0:22:52 0:22:52 --:--:-- 36741
Regards,
Srinidhi.
Updated by adrianSuSE over 1 year ago
Thanks, that unfortunately doesn't show an issue either. My network colleague suggests it is likely an issue on the destination server as he receives flaky behavior when downloading > from api.opensuse.org as well and suggests one should capture tcpdumps from both the source and destination. Since the destination seems to not be managed by regular openSUSE Heroes > but by the team operating the Build Service, I will assign the ticket to them. I hope they can help you further. Keep in mind that many are from Germany and are likely to be
enjoying their easter holidays already. Sorry to not have any better news for now.
Sorry, but this really looks like a problem in front of any OBS managed systems.
The referenced URL can be requested within 0.018 seconds on the login proxy, while it takes more then 6 seconds when asking via public IP. The login proxy itself has also free slots.
Feel free to reach out to me, when you want to debug this together.
Updated by adrianSuSE over 1 year ago
- Assignee changed from opensuse-admin-obs to crameleon
sorry, need to bounce back
Updated by crameleon over 1 year ago
Will get Adrian and my colleague in touch.
Updated by srinidhi over 1 year ago
Hi,
crameleon wrote:
Will get Adrian and my colleague in touch.
I wanted to know my options now. It has been more than 9 days since our private OBS instance has been impacted.
How can I help in debugging this problem further?
Regards,
Srinidhi.
Updated by crameleon over 1 year ago
Copying Robert's comment from the SD ticket (4/12/23, 20:52):
I've tried to add this to the opensuse.org ticket, but didn't find a way to update it, so I'll add it below
I've run that curl command for over 20 times, got download speeds between 500k and 4000k (downloads times between 11 sec and 90 sec), mostly under 30 sec.
I've started the downloads from Provo DC, SUSE internet line. The user seems to be in the same location, Microfocus internet line
Is the user still experiencing the slow download ? From the ticket it looks like 35k download speed. I didn't manage to reproduce that.
Are specific times when this happens ?
Does it happens for consecutive downloads (if the download is very slow, stopping it and restarting makes any difference ?)
When this happens please ask the user to
stop the download
start a "tcpdump -i any host api.opensuse.org -w api-opensuse.pcap"
run the curl command.
When it is over, stop the tcpdump and provide the pcap.
If Adrian is online at that time, he could take a tcpdump on api.opensuse.org frontend, filtering for user's public IP
Updated by crameleon over 1 year ago
- Assignee deleted (
crameleon)
I wanted to know my options now. It has been more than 9 days since our private OBS instance has been impacted.
I don't know your options. I am just a volunteer who tried to help you by relaying between two people who would have otherwise not read this ticket. I cannot help you any further, sorry.
Updated by srinidhi over 1 year ago
crameleon wrote:
Copying Robert's comment from the SD ticket (4/12/23, 20:52):
I've tried to add this to the opensuse.org ticket, but didn't find a way to update it, so I'll add it below
I've run that curl command for over 20 times, got download speeds between 500k and 4000k (downloads times between 11 sec and 90 sec), mostly under 30 sec.
I've started the downloads from Provo DC, SUSE internet line. The user seems to be in the same location, Microfocus internet line
Is the user still experiencing the slow download ? From the ticket it looks like 35k download speed. I didn't manage to reproduce that.
Yes.
Are specific times when this happens ?
No. I can reproduce this issue at any given time.
Does it happens for consecutive downloads (if the download is very slow, stopping it and restarting makes any difference ?)
It happens for every consecutive download. No, restarting the download does not improve the situation.
When this happens please ask the user to
stop the download
start a "tcpdump -i any host api.opensuse.org -w api-opensuse.pcap"
run the curl command.
When it is over, stop the tcpdump and provide the pcap.
I did run this. I forgot to stop the download soon after the curl command ended. The pcap file is 99 MB in size! Even gzip file size is 49 MB. Hence, I am not uploading it right now. I will try to capture it again during business hours rather than on weekends.
But I'm attaching the screenshot of the error summary report from Wireshark. There are a lot of TCP retransmission and DUP ACK.
If Adrian is online at that time, he could take a tcpdump on api.opensuse.org frontend, filtering for user's public IP
I will be available for the most part of the day. Please let me know when and I will be available for debugging this further.
Regards,
Srinidhi.
Updated by crameleon over 1 year ago
- Category changed from Core services and virtual infrastructure to OBS
- Assignee set to adrianSuSE
Copying Adrian's comment from the SD ticket (04/17/23, 10:10):
okay, I can reproduce the issue on login proxy host now. When using SSL the slow down can be observed (6 seconds vs 0.0x seconds). Looking deeper here, but it seems not to be a network issue in front if anymore.
Feel free to assign to me.
Updated by crameleon about 1 year ago
- Status changed from New to Feedback
Hi,
has this been resolved in the meanwhile?
Updated by srinidhi about 1 year ago
crameleon wrote in #note-24:
Hi,
has this been resolved in the meanwhile?
Yes. But differently - I had to backport (read: cherry-pick) all of the IPv6 support code from the master over to our private OBS 2.10 and switch to using IPv6 to avoid getting into these problems every 4-5 months.
Thank you for checking! I couldn't close this ticket. Please feel free to close this ticket.
Regards,
Srinidhi.
Updated by crameleon about 1 year ago
- Status changed from Feedback to Resolved
Interesting; glad you were able to "solve" it .
Thanks for confirming, best,
Georg