Project

General

Profile

coordination #122650

Updated by okurz 5 months ago

## Observation

openQA test in scenario sle-15-SP5-Online-s390x-xfstests_xfs-generic@s390x-kvm-sle15 fails in
[generate_report](http://openqa.suse.de/tests/10218783/modules/generate_report/steps/4)
All xfstests runs in sle-15-SP5 s390x fails on that issue.

In this specific case the connection attempt with failed curl was from (reading out from vars.json)
"SUT_IP" : "s390kvm082.suse.de",
"VIRSH_GUEST" : "10.161.145.82",
"VIRSH_HOSTNAME" : "s390zp18.suse.de",

At first, I thought this is the same issue under debugging in #120261, but after that solution(https://github.com/os-autoinst/openQA/pull/4935/files) merged our fails in s390x still. By looking into the details I don't know why these tests still use worker2.oqa.suse.de as the download IP. Previous last good used IP address not use FQDN. May need some help by the tools team.

okurz ran `time curl -O http://worker2.oqa.suse.de:20343/rfhqRYw7W_g045X2/files/status.log` which reproduces the problem quite explicitly:

```
# time curl -O http://worker2.oqa.suse.de:20343/rfhqRYw7W_g045X2/files/status.log
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:02:10 --:--:-- 0curl: (7) Failed to connect to worker2.oqa.suse.de port 20343: Connection timed out
real 2m11.316s
```

so very likely the firewall for the .oqa.suse.de zone just drops packets from 10.161.0.0

## Reproducible

Fails since (at least) Build [40.1](http://openqa.suse.de/tests/9918151#step/generate_report/4)

## Expected result

Last good: build38.1 http://openqa.suse.de/tests/9886322#step/generate_report/2

## Suggestions
1. Ask SUSE-IT network admins to REJECT packets instead of DROP so that we get more clear results #122653
2. Ask SUSE-IT network admins to *not* block this traffic which we need for tests #122656
3. As it looks like default connect timeout for curl resolves to 2m10s (see above) so that is above our default timeouts for script_run, etc., so find a combination where curl has a chance to provide a proper error earlier. earlier
4.
Consider using `upload_logs` in this specific example but this does not completely help. `upload_logs` uses a default timeout of 90s which is higher than the default for `script_run` of 30s which is still below the default for curl accounting to 2m10s. Maybe we add the parameter `--connect-timeout 20` to curl or bump the timeout for upload_logs #122659
5. Ensure the original problem is fixed #122539


## Further details
Link to [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm-sle15&test=xfstests_xfs-generic&version=15-SP5)

Back