action #133127: Frankencampus network broken + GitlabCi failed --> uploading artefacts - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #133127

closed

Frankencampus network broken + GitlabCi failed --> uploading artefacts

Added by osukup over 1 year ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Urgent

Assignee:

okurz

Category:

Target version:

openQA Project (public) - Ready

Start date:

2023-07-20

Due date:

% Done:

Estimated time:

Tags:

alert, network, gitlab, infra

Description

Observation¶

Job https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816

In reality it passed but upload of artifacts failed ....

from logs:

WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway  id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying...                                context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway  id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying...                                context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway  id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
FATAL: invalid argument                            

Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
1mERROR: Job failed: exit code 1

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Updated by jbaier_cz over 1 year ago

Might be related to slack: https://suse.slack.com/archives/C02AET1AAAD/p1689875420732159

Currently, we have Internet issues (high packet loss) in NUE2 Frankencampus which impacts the IPSec tunnels between NUE2 internal network and the other offices internal network. The issue has been acknowledged and worked on.

Actions

Copy link

Updated by okurz over 1 year ago

Tags set to infra, alert, network, gitlab
Subject changed from GitlabCi failled --> uploading artefacts to Frankencampus network broken + GitlabCi failed --> uploading artefacts
Priority changed from Normal to Urgent
Target version set to Ready

Actions

Copy link

Updated by okurz over 1 year ago

Description updated (diff)

Actions

Copy link

Updated by okurz over 1 year ago

Copied to action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? added

Actions

Copy link

Updated by okurz over 1 year ago

Related to action #133142: 4 baremetal SUTs in FC basement are unreachable added

Actions

Copy link

Updated by okurz over 1 year ago

Related to action #133154: osd-deployment failed because unreachable workers added

Actions

Copy link

Updated by mkittler over 1 year ago

Blocks action #132827: [tools][qe-core]test fails in rsync_client/salt-master, DNS resolve issue with workers "sapworker*" on multi-machine tests size:M added

Actions

Copy link

Updated by okurz over 1 year ago

Status changed from New to Feedback
Assignee set to okurz

waiting for resolution message in https://suse.slack.com/archives/C02AET1AAAD/p1689875420732159

Actions

Copy link

Updated by livdywan over 1 year ago

Still broken: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1708081 (I'm not investigating, just on alert duty and posting in case you need more confirmation of the current state)

Actions

Copy link

#10

Updated by okurz over 1 year ago

Yes, confirmed still broken. Although gitlab CI workers are back at least. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1709414 also shows passed jobs.

But problems still persist, e.g. due to a network loop from FC Basement lab, see https://suse.slack.com/archives/C029APBKLGK/p1689927663122119 regarding a network loop. But we also need to follow IT announcements regarding https://suse.slack.com/archives/C02AET1AAAD/p1689944939307469?thread_ts=1689875420.732159&cid=C02AET1AAAD

Actions

Copy link

#11

Updated by okurz over 1 year ago

Some fixes have been applied to the gitlab instance so I retriggered the osd-deployment in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/740680

Actions

Copy link

#12

Updated by okurz over 1 year ago

OSD deployment failed. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=WAITING_FOR_RESOURCE shows 20 jobs waiting for ressource but many previous jobs passed: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=SUCCESS . Last failure was https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=FAILED 8h ago. We should monitor if the queue reduces or builds up over night.

Actions

Copy link

#14

Updated by xlai over 1 year ago

Just add a comment for awareness: this issue is impacting sle micro 5.5 VT test too.

Actions

Copy link

#15

Updated by okurz over 1 year ago

Status changed from Feedback to Resolved

https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=FAILED shows the last failure from 22h ago so quite good. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=WAITING_FOR_RESOURCE shows only a single job. So looks good so far in gitlab CI jobs. I doubt I will get a proper resolution answer over Slack. If there are other specific issues left then we need to handle them in specific tickets if not in the already existing related ones.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #133127

Frankencampus network broken + GitlabCi failed --> uploading artefacts

Observation¶

Updated by jbaier_cz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by mkittler over 1 year ago

Updated by okurz over 1 year ago

Updated by livdywan over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by xlai over 1 year ago

Updated by okurz over 1 year ago