Project

General

Profile

coordination #122650

Updated by okurz almost 2 years ago

## Observation 

 openQA test in scenario sle-15-SP5-Online-s390x-xfstests_xfs-generic@s390x-kvm-sle15 fails in 
 [generate_report](http://openqa.suse.de/tests/10218783/modules/generate_report/steps/4) 
 All xfstests runs in sle-15-SP5 s390x fails on that issue. 

 In this specific case the connection attempt with failed curl was from (reading out from vars.json) 
   "SUT_IP" : "s390kvm082.suse.de", 
    "VIRSH_GUEST" : "10.161.145.82", 
    "VIRSH_HOSTNAME" : "s390zp18.suse.de", 


 At first, I thought this is the same issue under debugging in #120261, but after that solution(https://github.com/os-autoinst/openQA/pull/4935/files) merged our fails in s390x still. By looking into the details I don't know why these tests still use worker2.oqa.suse.de as the download IP. Previous last good used IP address not use FQDN. May need some help by the tools team. 

 okurz ran `time curl -O http://worker2.oqa.suse.de:20343/rfhqRYw7W_g045X2/files/status.log` which reproduces the problem quite explicitly: 

 ``` 
 # time curl -O http://worker2.oqa.suse.de:20343/rfhqRYw7W_g045X2/files/status.log 
   % Total      % Received % Xferd    Average Speed     Time      Time       Time    Current 
                                  Dload    Upload     Total     Spent      Left    Speed 
   0       0      0       0      0       0        0        0 --:--:--    0:02:10 --:--:--       0curl: (7) Failed to connect to worker2.oqa.suse.de port 20343: Connection timed out 
 real      2m11.316s 
 ``` 

 so very likely the firewall for the .oqa.suse.de zone just drops packets from 10.161.0.0 

 ## Reproducible 

 Fails since (at least) Build [40.1](http://openqa.suse.de/tests/9918151#step/generate_report/4) 


 ## Expected result 

 Last good: build38.1 http://openqa.suse.de/tests/9886322#step/generate_report/2 


 ## Suggestions 
 1. Ask SUSE-IT network admins to REJECT packets instead of DROP so that we get more clear results #122653 
 2. Ask SUSE-IT network admins to *not* block this traffic which we need for tests #122656 
 3. As it looks like default connect timeout for curl resolves to 2m10s (see above) so that is above our default timeouts for script_run, etc., so find a combination where curl has a chance to provide a proper error earlier. earlier 
 4. Consider using `upload_logs` in this specific example but this does not completely help. `upload_logs` uses a default timeout of 90s which is higher than the default for `script_run` of 30s which is still below the default for curl accounting to 2m10s. Maybe we add the parameter `--connect-timeout 20` to curl or bump the timeout for upload_logs #122659 
 5. Ensure the original problem is fixed #122539 


 ## Further details 
 Link to [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm-sle15&test=xfstests_xfs-generic&version=15-SP5)

Back