Project

General

Profile

Actions

coordination #122650

closed

QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

QA - coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones

[epic] Fix firewall block and improve error reporting when test fails in curl log upload

Added by okurz over 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-12-29
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Observation

openQA test in scenario sle-15-SP5-Online-s390x-xfstests_xfs-generic@s390x-kvm-sle15 fails in
generate_report
All xfstests runs in sle-15-SP5 s390x fails on that issue.

In this specific case the connection attempt with failed curl was from (reading out from vars.json)
"SUT_IP" : "s390kvm082.suse.de",
"VIRSH_GUEST" : "10.161.145.82",
"VIRSH_HOSTNAME" : "s390zp18.suse.de",

At first, I thought this is the same issue under debugging in #120261, but after that solution(https://github.com/os-autoinst/openQA/pull/4935/files) merged our fails in s390x still. By looking into the details I don't know why these tests still use worker2.oqa.suse.de as the download IP. Previous last good used IP address not use FQDN. May need some help by the tools team.

okurz ran time curl -O http://worker2.oqa.suse.de:20343/rfhqRYw7W_g045X2/files/status.log which reproduces the problem quite explicitly:

# time curl -O http://worker2.oqa.suse.de:20343/rfhqRYw7W_g045X2/files/status.log
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:02:10 --:--:--     0curl: (7) Failed to connect to worker2.oqa.suse.de port 20343: Connection timed out
real    2m11.316s

so very likely the firewall for the .oqa.suse.de zone just drops packets from 10.161.0.0

Reproducible

Fails since (at least) Build 40.1

Expected result

Last good: build38.1 http://openqa.suse.de/tests/9886322#step/generate_report/2

Suggestions

  1. Ask SUSE-IT network admins to REJECT packets instead of DROP so that we get more clear results #122653
  2. Ask SUSE-IT network admins to not block this traffic which we need for tests #122656
  3. As it looks like default connect timeout for curl resolves to 2m10s (see above) so that is above our default timeouts for script_run, etc., so find a combination where curl has a chance to provide a proper error earlier. Consider using upload_logs in this specific example but this does not completely help. upload_logs uses a default timeout of 90s which is higher than the default for script_run of 30s which is still below the default for curl accounting to 2m10s. Maybe we add the parameter --connect-timeout 20 to curl or bump the timeout for upload_logs #122659
  4. Ensure the original problem is fixed #122539

Further details

Link to latest


Subtasks 5 (0 open5 closed)

openQA Tests - action #122539: test fails in curl log from openqa and connect with FQDN worker2.oqa.suse.de always fails by time out size:MClosed2022-12-29

Actions
action #122608: exit code of shell command not received by script_runResolvedokurz2023-01-02

Actions
openQA Infrastructure - action #122653: Ask SUSE-IT network admins to REJECT packets instead of DROP so that we get more clear results size:SRejectedokurz2023-01-03

Actions
openQA Infrastructure - action #122656: Ask SUSE-IT network admins to *not* block this traffic which we need for tests regarding s390x within SUSE network size:MResolvedokurz2023-01-03

Actions
action #122659: Improved error reporting in openQA tests when curl times out on connection attemptsRejectedokurz2023-01-03

Actions

Related issues 1 (1 open0 closed)

Related to openQA Project - coordination #122665: [epic] Improved PowerVM testingNew2023-01-26

Actions
Actions

Also available in: Atom PDF