Project

General

Profile

Actions

action #94940

closed

multiple network related problems, gitlab CI pipelines not working, workers not reachable, proxySCC not reachable

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2021-06-30
Due date:
2021-07-07
% Done:

0%

Estimated time:

Description

Observation

see many reports by you and also in many other channels about network related problems. The only reference regarding someone "working on this problem" AFAIK is https://chat.suse.de/channel/suse-it-ama?msg=WA9YfB7CumBiPfupu but there had been no real update from anyone within EngInfra on who is working on that, what to do, workarounds to apply or estimates. Does anyone have a proper information source that we can follow?


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #94919: All arm workers down 2021-06-30 , NUE SRV2 Rack A8 was switched off by EngInfra size:SResolveddheidler2021-06-30

Actions
Actions #1

Updated by okurz over 3 years ago

  • Related to action #94919: All arm workers down 2021-06-30 , NUE SRV2 Rack A8 was switched off by EngInfra size:S added
Actions #2

Updated by okurz over 3 years ago

If you like you can use the ticket as a label in failed openQA jobs as well. I hope someone can provide better insight as so far I don't know what we should do about it. This is so far just for tracking, not solving anything

Actions #4

Updated by okurz over 3 years ago

  • Status changed from Feedback to Blocked
  • Priority changed from Immediate to Urgent

gitlab CI pipelines seems to be able to access the network again. I assume also proxySCC works again. All osd workers are reachable with exception of openqaworker-arm-1/2/3, see #94919 , blocked on that.

Wrote a message in https://chat.suse.de/channel/testing?msg=FmSxdsSvkxgEzcLgA as update:

All OSD aarch64 workers are still offline and unreachable since 2021-06-30 . There was no update by EngInfra in the ticket https://infra.nue.suse.com/SelfService/Display.html?id=191475 since yesterday midday so I can't tell you more than that

Actions #5

Updated by okurz over 3 years ago

  • Due date set to 2021-07-07
  • Status changed from Blocked to Feedback

All aarch64 workers online and working on jobs again. asmorodskyi wrote an email that there might be more network related problems. Crosschecking before closing.

Actions #6

Updated by asmorodskyi over 3 years ago

okurz wrote:

All aarch64 workers online and working on jobs again. asmorodskyi wrote an email that there might be more network related problems. Crosschecking before closing.

I can confirm that all problems which I mentioned in my emails are gone now. We also did some timeouts increase on our side :
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/12848
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/12847

Actions #7

Updated by okurz over 3 years ago

  • Status changed from Feedback to Resolved

Thank you. All good then.

Actions

Also available in: Atom PDF