Project

General

Profile

Actions

action #123232

closed

[Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance size:S

Added by livdywan almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2022-11-20
Due date:
2023-02-24
% Done:

0%

Estimated time:

Description

Observation

I received a lot of emails about failing pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror. All links point to gitlab2.suse.de which appears to be inaccessible or down, which is why I couldn't check what's going on there.

Pipelines on gitlab.suse.de seem to be fine.

Acceptance criteria

  • AC1: No waves of alert emails for an inaccessible gitlab instance

Suggestion


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #121816: Cannot access installation media on updates.suse.com - maintenance tests broken size:SResolvedlivdywan2022-12-122023-02-24

Actions
Actions #1

Updated by livdywan almost 2 years ago

  • Status changed from New to Blocked
  • Assignee set to livdywan
Actions #2

Updated by livdywan almost 2 years ago

  • Status changed from Blocked to Feedback

No more emails should be sent from what appears to be a staging instance, as I understand it. Also being discussed in Slack.

Just to be clear I added a comment asking that neither emails nor production code from pipelines is executed there, since I couldn't confirm what the pipelines were doing.

Actions #3

Updated by okurz almost 2 years ago

  • Status changed from Feedback to Resolved

cdywan wrote:

No more emails should be sent from what appears to be a staging instance, as I understand it. Also being discussed in Slack.

Just to be clear I added a comment asking that neither emails nor production code from pipelines is executed there, since I couldn't confirm what the pipelines were doing.

Sure, sounds good. Let's consider this resolved as Moroni stated they really did this as as one-time accident.

Actions #4

Updated by livdywan almost 2 years ago

  • Subject changed from [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab2 to [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance
  • Status changed from Resolved to In Progress

I'm re-opening because we're now getting failing alerts from gitlab-stage.suse.de.

Actions #5

Updated by jbaier_cz almost 2 years ago

As a temporal workaround, I logged in to the staging instance and manually disabled schedules in our projects (those, which we saw in the mailbox). Hopefully, this should lower the amount of false-positive e-mails in our mailing list. At least until someone does a new DB synchronization which will override the settings.

Actions #6

Updated by jbaier_cz almost 2 years ago

Please also note, there is still around 150 jobs waiting in the queue for bot-ng (jobs are scheduled quite often for that project), so we still might get some e-mails; for this project, I disabled the CI and notification e-mails completely, which might help a little.

Actions #7

Updated by mkittler almost 2 years ago

  • Subject changed from [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance to [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance size:S
  • Description updated (diff)
  • Status changed from In Progress to Feedback
Actions #8

Updated by livdywan almost 2 years ago

  • Due date set to 2023-02-03
  • Status changed from Feedback to Blocked

Waiting to get a response for now, using the due date as a reminder.

Actions #9

Updated by livdywan almost 2 years ago

  • Due date changed from 2023-02-03 to 2023-02-10

Bumping due date due to hackweek.

Actions #10

Updated by livdywan almost 2 years ago

Still waiting on clarification concerning SD-109568. We need to make sure our tickets are being taken seriously.

Actions #11

Updated by okurz almost 2 years ago

  • Related to action #121816: Cannot access installation media on updates.suse.com - maintenance tests broken size:S added
Actions #12

Updated by okurz almost 2 years ago

  • Due date changed from 2023-02-10 to 2023-02-24

There was no response in https://sd.suse.com/servicedesk/customer/portal/1/SD-109568 but an escalation follow-up is conducted. cdywan will CC osd-admins@suse.de or something where others from the team should be able to follow up. Now related to #121816, not regarding the original issue but the considerations regarding process and process improvements.

Actions #13

Updated by livdywan almost 2 years ago

  • Status changed from Blocked to Resolved

I talked about this last week but didn't save the comment here. I'm concluding the ticket after having had a feedback session with Vit, Moroni and Matthias. We looked at this and other recent tickets to compare perspectives by example. The particular case was rooted in an upstream GitLab issue found after an upgrade which had a bigger impact than one would expect. We also discussed how limited capacity contributes to delays. This ticket was also seen as low impact, which I can relate to even if it was painful for us to deal with.

One take away is that we should try and clarify the impact on any ticket, and focus on the most critical issues. Conversely we can ask for clarification in terms of when and if someone can currently help us.

Also it is worth making sure we're talking to the right person. We (the Tools team) can't see it in tickets but assignments can be left open or put wrongly. It is fine to ask "Is the right person able to help here", or "When can we realistically get this done" or even asking ourselves if we can find alternative approaches.
Reviving close collaboration with Matthias' team was also suggested.

Actions

Also available in: Atom PDF