action #133127
closedFrankencampus network broken + GitlabCi failed --> uploading artefacts
0%
Description
Observation¶
Job https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816
In reality it passed but upload of artifacts failed ....
from logs:
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
FATAL: invalid argument
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
1mERROR: Job failed: exit code 1
Updated by jbaier_cz over 1 year ago
Might be related to slack: https://suse.slack.com/archives/C02AET1AAAD/p1689875420732159
Currently, we have Internet issues (high packet loss) in NUE2 Frankencampus which impacts the IPSec tunnels between NUE2 internal network and the other offices internal network. The issue has been acknowledged and worked on.
Updated by okurz over 1 year ago
- Tags set to infra, alert, network, gitlab
- Subject changed from GitlabCi failled --> uploading artefacts to Frankencampus network broken + GitlabCi failed --> uploading artefacts
- Priority changed from Normal to Urgent
- Target version set to Ready
Updated by okurz over 1 year ago
- Copied to action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? added
Updated by okurz over 1 year ago
- Related to action #133142: 4 baremetal SUTs in FC basement are unreachable added
Updated by okurz over 1 year ago
- Related to action #133154: osd-deployment failed because unreachable workers added
Updated by mkittler over 1 year ago
- Blocks action #132827: [tools][qe-core]test fails in rsync_client/salt-master, DNS resolve issue with workers "sapworker*" on multi-machine tests size:M added
Updated by okurz over 1 year ago
- Status changed from New to Feedback
- Assignee set to okurz
waiting for resolution message in https://suse.slack.com/archives/C02AET1AAAD/p1689875420732159
Updated by livdywan over 1 year ago
Still broken: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1708081 (I'm not investigating, just on alert duty and posting in case you need more confirmation of the current state)
Updated by okurz over 1 year ago
Yes, confirmed still broken. Although gitlab CI workers are back at least. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1709414 also shows passed jobs.
But problems still persist, e.g. due to a network loop from FC Basement lab, see https://suse.slack.com/archives/C029APBKLGK/p1689927663122119 regarding a network loop. But we also need to follow IT announcements regarding https://suse.slack.com/archives/C02AET1AAAD/p1689944939307469?thread_ts=1689875420.732159&cid=C02AET1AAAD
Updated by okurz over 1 year ago
Some fixes have been applied to the gitlab instance so I retriggered the osd-deployment in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/740680
Updated by okurz over 1 year ago
OSD deployment failed. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=WAITING_FOR_RESOURCE shows 20 jobs waiting for ressource but many previous jobs passed: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=SUCCESS . Last failure was https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=FAILED 8h ago. We should monitor if the queue reduces or builds up over night.
Updated by xlai over 1 year ago
Just add a comment for awareness: this issue is impacting sle micro 5.5 VT test too.
Updated by okurz over 1 year ago
- Status changed from Feedback to Resolved
https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=FAILED shows the last failure from 22h ago so quite good. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs?statuses=WAITING_FOR_RESOURCE shows only a single job. So looks good so far in gitlab CI jobs. I doubt I will get a proper resolution answer over Slack. If there are other specific issues left then we need to handle them in specific tickets if not in the already existing related ones.