Project

General

Profile

Actions

tickets #113021

closed

Very slow network connectivity with api.opensuse.org

Added by srinidhi over 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OBS
Target version:
-
Start date:
2022-06-24
Due date:
% Done:

0%

Estimated time:

Description

Hello Team,

This ticket is similar to a previous ticket - https://progress.opensuse.org/issues/104844 - that I had raised earlier this year. The same issue has been observed since this morning.

Here are the output for the traceroute and the mtr commands:

# traceroute api.opensuse.org
traceroute to api.opensuse.org (195.135.221.162), 30 hops max, 60 byte packets
 1  firewall.blr.build.net (192.168.10.1)  0.213 ms  0.171 ms  0.152 ms
 2  * * *
 3  blr-asa-2.gns.attachmategroup.com (164.99.203.246)  0.706 ms  0.778 ms  0.733 ms
 4  192.31.114.253 (192.31.114.253)  0.883 ms  0.876 ms  0.900 ms
 5  59.145.123.141 (59.145.123.141)  2.633 ms  2.649 ms  2.595 ms
 6  116.119.68.57 (116.119.68.57)  139.240 ms 116.119.106.116 (116.119.106.116)  148.847 ms 182.79.206.46 (182.79.206.46)  141.584 ms
 7  ae7-0.lon10.core-backbone.com (80.255.15.229)  331.742 ms  333.591 ms  332.362 ms
 8  ae2-2001.nbg30.core-backbone.com (81.95.15.161)  344.253 ms  335.322 ms  341.735 ms
 9  core-backbone.microfocus.com (5.56.18.210)  341.424 ms  341.404 ms  343.420 ms
10  195.135.221.26 (195.135.221.26)  345.098 ms  338.877 ms  338.660 ms
11  195.135.221.162 (195.135.221.162)  340.077 ms !X  334.949 ms !X  335.823 ms !X
# mtr api.opensuse.org -4 -r 20
Start: 2022-06-24T16:43:38+0530
HOST: blrcooker                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- firewall.blr.build.net     0.0%    10    0.2   0.3   0.2   0.3   0.0
  2.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  3.|-- blr-asa-2.gns.attachmateg  0.0%    10    0.9   1.0   0.9   1.1   0.1
  4.|-- 192.31.114.253             0.0%    10    1.1   1.1   1.0   1.3   0.1
  5.|-- 59.145.123.141             0.0%    10    3.0   2.9   2.7   3.1   0.1
  6.|-- 116.119.68.57              0.0%    10  139.1 141.1 139.1 156.5   5.4
  7.|-- ae7-0.lon10.core-backbone 10.0%    10  326.2 325.9 325.8 326.2   0.1
  8.|-- ae2-2001.nbg30.core-backb 20.0%    10  335.8 335.9 335.7 336.1   0.1
  9.|-- core-backbone.microfocus. 20.0%    10  336.3 334.6 334.2 336.3   0.7
 10.|-- 195.135.221.26            20.0%    10  339.1 339.9 338.6 342.8   1.7
 11.|-- 195.135.221.162           10.0%    10  334.1 334.2 333.9 334.6   0.2
mtr: udp socket connect failed: Invalid argument

It looks like the routing problem with Core BackBone switches is recurring. Would you be so kind to report this issue to Core BackBone to correct their routing table?

Regards,
Srinidhi.


Files

api-opensuse-pcap-errors.png (127 KB) api-opensuse-pcap-errors.png srinidhi, 2023-04-16 20:21
Actions #1

Updated by cboltz over 2 years ago

  • Category set to Core services and virtual infrastructure
  • Assignee set to SUSE-Admins
Actions #2

Updated by crameleon almost 2 years ago

Hi,

Is this still an issue?

Actions #3

Updated by srinidhi almost 2 years ago

crameleon wrote:

Hi,

Is this still an issue?

No. This is no longer an issue and can be closed. I don't know how to close tickets myself.

Regards,
Srinidhi.

Actions #4

Updated by crameleon almost 2 years ago

  • Status changed from New to Closed

Alright! Thanks for getting back.

Actions #5

Updated by srinidhi over 1 year ago

crameleon wrote:

Alright! Thanks for getting back.

Hello @crameleon,

This issue is back and we are very badly impacted:

# mtr api.opensuse.org -4 -r 20
Start: 2023-04-06T12:08:48+0530
HOST: blrcooker                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- firewall.blr.build.net     0.0%    10    0.3   0.3   0.2   0.3   0.0
  2.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  3.|-- 164.99.203.246             0.0%    10    0.9   0.9   0.7   1.1   0.1
  4.|-- 192.31.114.253             0.0%    10    1.4   2.9   1.1  18.0   5.3
  5.|-- 32.114.83.145              0.0%    10    9.9  10.1   9.9  10.4   0.2
  6.|-- 32.109.34.101              0.0%    10    8.9  11.0   8.9  12.6   1.3
  7.|-- 165.87.139.85              0.0%    10   12.6  11.2   9.1  12.6   1.1
  8.|-- 32.119.109.109             0.0%    10   12.0  11.5   9.9  12.9   0.8
  9.|-- 165.87.76.34               0.0%    10    8.4   8.5   8.3   8.7   0.1
 10.|-- 32.109.50.94               0.0%    10  128.3 128.3 128.1 128.5   0.1
 11.|-- ae10-1.fra30.core-backbon  0.0%    10  128.1 128.2 128.0 128.4   0.1
 12.|-- ae2-2001.nbg30.core-backb  0.0%    10  131.3 131.3 131.1 131.4   0.1
 13.|-- core-backbone.microfocus.  0.0%    10  131.1 131.2 131.1 131.3   0.1
 14.|-- 195.135.221.26             0.0%    10  131.3 132.1 131.2 137.6   2.0
 15.|-- 195.135.221.162            0.0%    10  131.5 131.5 131.5 131.7   0.1
mtr: udp socket connect failed: Invalid argument

# curl 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/i586?view=binaryversionscode' > /dev/null                                                                                                                                % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 45.8M  100 45.8M    0     0  47286      0  0:16:56  0:16:56 --:--:-- 44674

Looks like the problem is once again with the core-backbone servers. Could you please help us out here?

Regards,
Srinidhi.

Actions #6

Updated by crameleon over 1 year ago

Hi,

this is a tough one. The network is managed by SUSE, but there are currently no openSUSE volunteers in the SUSE network team. I will try to forward it and reference this and your last ticket.

Cheers,
Georg

Actions #7

Updated by crameleon over 1 year ago

  • Private changed from Yes to No
Actions #8

Updated by crameleon over 1 year ago

Actions #9

Updated by crameleon over 1 year ago

  • Status changed from Closed to New
Actions #10

Updated by crameleon over 1 year ago

Hi,

I received feedback:

  • your last MTR output looks normal
  • can you provide MTR and tracepath output with both IPv4 and IPv6 to further investigate the issue?
Actions #11

Updated by srinidhi over 1 year ago

crameleon wrote:

Hi,

I received feedback:

  • your last MTR output looks normal

Oh, okay! That is interesting!

  • can you provide MTR and tracepath output with both IPv4 and IPv6 to further investigate the issue?

Unfortunately, I can only provide data for IPv4 because we do not have an IPv6 address assigned to our Build Service backend.

MTR

# mtr api.opensuse.org -4 -r 20
Start: 2023-04-06T20:53:33+0530
HOST: blrcooker                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- firewall.blr.build.net     0.0%    10    0.2   0.2   0.2   0.4   0.1
  2.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  3.|-- 164.99.203.246             0.0%    10    1.0   1.0   0.7   1.2   0.1
  4.|-- 192.31.114.253             0.0%    10    1.3   1.2   1.0   1.4   0.1
  5.|-- 32.114.83.145              0.0%    10   10.3  10.1   9.9  10.3   0.1
  6.|-- 32.109.34.101              0.0%    10    9.9  10.9   9.1  12.2   1.0
  7.|-- 165.87.139.85              0.0%    10   12.7  11.2   9.3  12.7   1.1
  8.|-- 32.119.109.109            10.0%    10   10.2  11.4  10.2  12.3   0.7
  9.|-- 165.87.76.34               0.0%    10    8.5   8.5   8.3   8.6   0.1
 10.|-- 32.109.50.94               0.0%    10  128.1 128.3 128.1 128.5   0.1
 11.|-- ae10-1.fra30.core-backbon  0.0%    10  128.1 128.3 128.1 128.5   0.1
 12.|-- ae2-2001.nbg30.core-backb  0.0%    10  131.3 131.3 131.1 131.4   0.1
 13.|-- core-backbone.microfocus.  0.0%    10  132.4 131.3 131.1 132.4   0.4
 14.|-- 195.135.221.26             0.0%    10  131.6 131.4 131.3 131.6   0.1
 15.|-- 195.135.221.162            0.0%    10  131.6 131.6 131.4 131.7   0.1
mtr: udp socket connect failed: Invalid argument

Traceroute

# traceroute api.opensuse.org 
traceroute to api.opensuse.org (195.135.221.162), 30 hops max, 60 byte packets
 1  firewall.blr.build.net (192.168.10.1)  0.190 ms  0.264 ms  0.231 ms
 2  * * *
 3  164.99.203.246 (164.99.203.246)  0.819 ms  0.871 ms  0.756 ms
 4  192.31.114.253 (192.31.114.253)  1.030 ms  1.017 ms  0.965 ms
 5  32.114.83.145 (32.114.83.145)  9.714 ms  9.708 ms  9.695 ms
 6  32.109.34.101 (32.109.34.101)  11.806 ms  11.739 ms  11.709 ms
 7  165.87.139.85 (165.87.139.85)  8.683 ms  12.098 ms  12.104 ms
 8  32.119.109.109 (32.119.109.109)  30.684 ms  30.423 ms  30.435 ms
 9  165.87.76.34 (165.87.76.34)  8.325 ms  8.662 ms  8.308 ms
10  32.109.50.94 (32.109.50.94)  128.300 ms  128.308 ms  128.294 ms
11  ae3-1.fra20.core-backbone.com (80.81.192.187)  128.653 ms  128.631 ms  128.584 ms
12  ae2-2001.nbg30.core-backbone.com (81.95.15.161)  131.367 ms  131.360 ms  131.262 ms
13  core-backbone.microfocus.com (5.56.18.210)  131.280 ms  131.245 ms  131.239 ms
14  195.135.221.26 (195.135.221.26)  131.349 ms  131.728 ms  131.675 ms
15  195.135.221.162 (195.135.221.162)  131.549 ms !X  131.858 ms !X  131.861 ms !X
Actions #12

Updated by crameleon over 1 year ago

Thanks for the quick reply.
Could you provide tracepath output as well, to check for possible MTU problems?

Actions #13

Updated by srinidhi over 1 year ago

crameleon wrote:

Thanks for the quick reply.

I'm desperately refreshing this page every 2-5 minutes :-) as our Private Build Service instance has been almost unusable for the entire working day today! :-(

Could you provide tracepath output as well, to check for possible MTU problems?

# tracepath api.opensuse.org 
 1?: [LOCALHOST]                                         pmtu 1500
 1:  firewall.blr.build.net                                0.183ms 
 1:  firewall.blr.build.net                                0.109ms 
 2:  no reply
 3:  164.99.203.246                                        7.238ms 
 4:  192.31.114.253                                        1.028ms 
 5:  32.114.83.145                                        16.434ms 
 6:  32.109.34.101                                        15.548ms asymm  9 
 7:  165.87.139.85                                        18.080ms asymm  8 
 8:  32.119.109.109                                       18.462ms 
 9:  165.87.76.34                                         14.828ms asymm  8 
10:  32.109.50.94                                        134.602ms asymm 11 
11:  ae3-1.fra20.core-backbone.com                       134.942ms asymm 12 
12:  ae2-2001.nbg30.core-backbone.com                    137.651ms asymm 14 
13:  core-backbone.microfocus.com                        137.952ms asymm 14 
14:  195.135.221.26                                      142.212ms asymm 15 
15:  195.135.221.162                                     141.957ms !H
     Resume: pmtu 1500

Regards,
Srinidhi.

Actions #14

Updated by crameleon over 1 year ago

  • Assignee changed from SUSE-Admins to opensuse-admin-obs

Thanks, that unfortunately doesn't show an issue either. My network colleague suggests it is likely an issue on the destination server as he receives flaky behavior when downloading from api.opensuse.org as well and suggests one should capture tcpdumps from both the source and destination. Since the destination seems to not be managed by regular openSUSE Heroes but by the team operating the Build Service, I will assign the ticket to them. I hope they can help you further. Keep in mind that many are from Germany and are likely to be enjoying their easter holidays already. Sorry to not have any better news for now.

Cheers
Georg

Actions #15

Updated by srinidhi over 1 year ago

Hi,

crameleon wrote:

Thanks, that unfortunately doesn't show an issue either. My network colleague suggests it is likely an issue on the destination server as he receives flaky behavior when downloading from api.opensuse.org as well and suggests one should capture tcpdumps from both the source and destination. Since the destination seems to not be managed by regular openSUSE Heroes but by the team operating the Build Service, I will assign the ticket to them. I hope they can help you further. Keep in mind that many are from Germany and are likely to be enjoying their easter holidays already. Sorry to not have any better news for now.

Any updates for me, please?

The network throughput has worsened since Sunday!

# curl -4 'https://api.opensuse.org/public/build/SUSE:SLE-15:Update/pool/i586?view=binaryversionscode' > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 45.9M  100 45.9M    0     0  35117      0  0:22:52  0:22:52 --:--:-- 36741

Regards,
Srinidhi.

Actions #16

Updated by adrianSuSE over 1 year ago

Thanks, that unfortunately doesn't show an issue either. My network colleague suggests it is likely an issue on the destination server as he receives flaky behavior when downloading > from api.opensuse.org as well and suggests one should capture tcpdumps from both the source and destination. Since the destination seems to not be managed by regular openSUSE Heroes > but by the team operating the Build Service, I will assign the ticket to them. I hope they can help you further. Keep in mind that many are from Germany and are likely to be
enjoying their easter holidays already. Sorry to not have any better news for now.

Sorry, but this really looks like a problem in front of any OBS managed systems.

The referenced URL can be requested within 0.018 seconds on the login proxy, while it takes more then 6 seconds when asking via public IP. The login proxy itself has also free slots.

Feel free to reach out to me, when you want to debug this together.

Actions #17

Updated by adrianSuSE over 1 year ago

  • Assignee changed from opensuse-admin-obs to crameleon

sorry, need to bounce back

Actions #18

Updated by crameleon over 1 year ago

Will get Adrian and my colleague in touch.

Actions #19

Updated by srinidhi over 1 year ago

Hi,

crameleon wrote:

Will get Adrian and my colleague in touch.

I wanted to know my options now. It has been more than 9 days since our private OBS instance has been impacted.

How can I help in debugging this problem further?

Regards,
Srinidhi.

Actions #20

Updated by crameleon over 1 year ago

Copying Robert's comment from the SD ticket (4/12/23, 20:52):

I've tried to add this to the opensuse.org ticket, but didn't find a way to update it, so I'll add it below
I've run that curl command for over 20 times, got download speeds between 500k and 4000k (downloads times between 11 sec and 90 sec), mostly under 30 sec.
I've started the downloads from Provo DC, SUSE internet line. The user seems to be in the same location, Microfocus internet line
Is the user still experiencing the slow download ? From the ticket it looks like 35k download speed. I didn't manage to reproduce that.
Are specific times when this happens ?
Does it happens for consecutive downloads (if the download is very slow, stopping it and restarting makes any difference ?)
When this happens please ask the user to
stop the download
start a "tcpdump -i any host api.opensuse.org -w api-opensuse.pcap"
run the curl command.
When it is over, stop the tcpdump and provide the pcap.
If Adrian is online at that time, he could take a tcpdump on api.opensuse.org frontend, filtering for user's public IP

Actions #21

Updated by crameleon over 1 year ago

  • Assignee deleted (crameleon)

I wanted to know my options now. It has been more than 9 days since our private OBS instance has been impacted.

I don't know your options. I am just a volunteer who tried to help you by relaying between two people who would have otherwise not read this ticket. I cannot help you any further, sorry.

Actions #22

Updated by srinidhi over 1 year ago

crameleon wrote:

Copying Robert's comment from the SD ticket (4/12/23, 20:52):

I've tried to add this to the opensuse.org ticket, but didn't find a way to update it, so I'll add it below
I've run that curl command for over 20 times, got download speeds between 500k and 4000k (downloads times between 11 sec and 90 sec), mostly under 30 sec.
I've started the downloads from Provo DC, SUSE internet line. The user seems to be in the same location, Microfocus internet line
Is the user still experiencing the slow download ? From the ticket it looks like 35k download speed. I didn't manage to reproduce that.

Yes.

Are specific times when this happens ?

No. I can reproduce this issue at any given time.

Does it happens for consecutive downloads (if the download is very slow, stopping it and restarting makes any difference ?)

It happens for every consecutive download. No, restarting the download does not improve the situation.

When this happens please ask the user to
stop the download
start a "tcpdump -i any host api.opensuse.org -w api-opensuse.pcap"
run the curl command.
When it is over, stop the tcpdump and provide the pcap.

I did run this. I forgot to stop the download soon after the curl command ended. The pcap file is 99 MB in size! Even gzip file size is 49 MB. Hence, I am not uploading it right now. I will try to capture it again during business hours rather than on weekends.

But I'm attaching the screenshot of the error summary report from Wireshark. There are a lot of TCP retransmission and DUP ACK.

If Adrian is online at that time, he could take a tcpdump on api.opensuse.org frontend, filtering for user's public IP

I will be available for the most part of the day. Please let me know when and I will be available for debugging this further.

Regards,
Srinidhi.

Actions #23

Updated by crameleon over 1 year ago

  • Category changed from Core services and virtual infrastructure to OBS
  • Assignee set to adrianSuSE

Copying Adrian's comment from the SD ticket (04/17/23, 10:10):

okay, I can reproduce the issue on login proxy host now. When using SSL the slow down can be observed (6 seconds vs 0.0x seconds). Looking deeper here, but it seems not to be a network issue in front if anymore.

Feel free to assign to me.

Actions #24

Updated by crameleon about 1 year ago

  • Status changed from New to Feedback

Hi,

has this been resolved in the meanwhile?

Actions #25

Updated by srinidhi about 1 year ago

crameleon wrote in #note-24:

Hi,

has this been resolved in the meanwhile?

Yes. But differently - I had to backport (read: cherry-pick) all of the IPv6 support code from the master over to our private OBS 2.10 and switch to using IPv6 to avoid getting into these problems every 4-5 months.

Thank you for checking! I couldn't close this ticket. Please feel free to close this ticket.

Regards,
Srinidhi.

Actions #26

Updated by crameleon about 1 year ago

  • Status changed from Feedback to Resolved

Interesting; glad you were able to "solve" it .

Thanks for confirming, best,
Georg

Actions

Also available in: Atom PDF