Project

General

Profile

Actions

action #133127

closed

Frankencampus network broken + GitlabCi failed --> uploading artefacts

Added by osukup 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-07-20
Due date:
% Done:

0%

Estimated time:

Description

Observation

Job https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816

In reality it passed but upload of artifacts failed ....

from logs:

WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway  id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying...                                context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway  id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying...                                context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway  id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
FATAL: invalid argument                            

Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
1mERROR: Job failed: exit code 1

Related issues 4 (1 open3 closed)

Related to openQA Infrastructure - action #133142: 4 baremetal SUTs in FC basement are unreachableResolvedokurz2023-07-21

Actions
Related to openQA Infrastructure - action #133154: osd-deployment failed because unreachable workersResolvedokurz2023-07-21

Actions
Blocks openQA Infrastructure - action #132827: [tools][qe-core]test fails in rsync_client/salt-master, DNS resolve issue with workers "sapworker*" on multi-machine tests size:MWorkable2023-07-17

Actions
Copied to openQA Infrastructure - action #133130: Lots of alerts for a single cause. Can we group and de-duplicate?Resolvednicksinger2023-07-20

Actions
Actions #1

Updated by jbaier_cz 9 months ago

Might be related to slack: https://suse.slack.com/archives/C02AET1AAAD/p1689875420732159

Currently, we have Internet issues (high packet loss) in NUE2 Frankencampus which impacts the IPSec tunnels between NUE2 internal network and the other offices internal network. The issue has been acknowledged and worked on.

Actions #2

Updated by okurz 9 months ago

  • Tags set to infra, alert, network, gitlab
  • Subject changed from GitlabCi failled --> uploading artefacts to Frankencampus network broken + GitlabCi failed --> uploading artefacts
  • Priority changed from Normal to Urgent
  • Target version set to Ready
Actions #3

Updated by okurz 9 months ago

  • Description updated (diff)
Actions #4

Updated by okurz 9 months ago

  • Copied to action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? added
Actions #5

Updated by okurz 9 months ago

  • Related to action #133142: 4 baremetal SUTs in FC basement are unreachable added
Actions #6

Updated by okurz 9 months ago

  • Related to action #133154: osd-deployment failed because unreachable workers added
Actions #7

Updated by mkittler 9 months ago

  • Blocks action #132827: [tools][qe-core]test fails in rsync_client/salt-master, DNS resolve issue with workers "sapworker*" on multi-machine tests size:M added
Actions #8

Updated by okurz 9 months ago

  • Status changed from New to Feedback
  • Assignee set to okurz
Actions #9

Updated by livdywan 9 months ago

Still broken: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1708081 (I'm not investigating, just on alert duty and posting in case you need more confirmation of the current state)

Actions #10

Updated by okurz 9 months ago

Yes, confirmed still broken. Although gitlab CI workers are back at least. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1709414 also shows passed jobs.

But problems still persist, e.g. due to a network loop from FC Basement lab, see https://suse.slack.com/archives/C029APBKLGK/p1689927663122119 regarding a network loop. But we also need to follow IT announcements regarding https://suse.slack.com/archives/C02AET1AAAD/p1689944939307469?thread_ts=1689875420.732159&cid=C02AET1AAAD

Actions #11

Updated by okurz 9 months ago

Some fixes have been applied to the gitlab instance so I retriggered the osd-deployment in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/740680

Actions #12

Updated by okurz 9 months ago

OSD deployment failed. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=WAITING_FOR_RESOURCE shows 20 jobs waiting for ressource but many previous jobs passed: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=SUCCESS . Last failure was https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=FAILED 8h ago. We should monitor if the queue reduces or builds up over night.

Actions #14

Updated by xlai 9 months ago

Just add a comment for awareness: this issue is impacting sle micro 5.5 VT test too.

Actions #15

Updated by okurz 9 months ago

  • Status changed from Feedback to Resolved

https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=FAILED shows the last failure from 22h ago so quite good. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=WAITING_FOR_RESOURCE shows only a single job. So looks good so far in gitlab CI jobs. I doubt I will get a proper resolution answer over Slack. If there are other specific issues left then we need to handle them in specific tickets if not in the already existing related ones.

Actions

Also available in: Atom PDF