Project

General

Profile

action #122539

Updated by okurz almost 2 years ago

## Observation 

 openQA test in scenario sle-15-SP5-Online-s390x-xfstests_xfs-generic@s390x-kvm-sle15 fails in 
 [generate_report](http://openqa.suse.de/tests/10218783/modules/generate_report/steps/4) 
 All xfstests runs in sle-15-SP5 s390x fails on that issue. 

 In this specific case the connection attempt with failed curl was from (reading out from vars.json) 
   "SUT_IP" : "s390kvm082.suse.de", 
    "VIRSH_GUEST" : "10.161.145.82", 
    "VIRSH_HOSTNAME" : "s390zp18.suse.de", 


 At first, I thought this is the same issue under debugging in #120261, but after that solution(https://github.com/os-autoinst/openQA/pull/4935/files) merged our fails in s390x still. By looking into the details I don't know why these tests still use worker2.oqa.suse.de as the download IP. Previous last good used IP address not use FQDN. May need some help by the tools team. 

 okurz ran `time curl -O http://worker2.oqa.suse.de:20343/rfhqRYw7W_g045X2/files/status.log` which reproduces the problem quite explicitly: 

 ``` 
 # time curl -O http://worker2.oqa.suse.de:20343/rfhqRYw7W_g045X2/files/status.log 
   % Total      % Received % Xferd    Average Speed     Time      Time       Time    Current 
                                  Dload    Upload     Total     Spent      Left    Speed 
   0       0      0       0      0       0        0        0 --:--:--    0:02:10 --:--:--       0curl: (7) Failed to connect to worker2.oqa.suse.de port 20343: Connection timed out 
 real      2m11.316s 
 ``` 

 so very likely the firewall for the .oqa.suse.de zone just drops packets from 10.161.0.0 

 ## Reproducible 

 Fails since (at least) Build [40.1](http://openqa.suse.de/tests/9918151#step/generate_report/4) 


 ## Expected result 

 Last good: build38.1 http://openqa.suse.de/tests/9886322#step/generate_report/2 


 ## Suggestions 
 1. Ask SUSE-IT network admins to REJECT packets instead of DROP so that we get more clear results 
 2. Ask SUSE-IT network admins to *not* block this traffic which we need for tests 
 3. As it looks like default connect timeout for curl resolves to 2m10s (see above) so that is above our default timeouts for script_run, etc., so find a combination where curl has a chance to provide a proper error earlier 
 4. Consider using `upload_logs` in this specific example but this does not completely help. `upload_logs` uses a default timeout of 90s which is higher than the default for `script_run` of 30s which is still below the default for curl accounting to 2m10s. Maybe we add the parameter `--connect-timeout 20` to curl or bump the timeout for upload_logs 


 ## Further details 
 Link to [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm-sle15&test=xfstests_xfs-generic&version=15-SP5)

Back