action #123232
closed
[Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance size:S
Added by livdywan almost 2 years ago.
Updated almost 2 years ago.
Description
Observation¶
I received a lot of emails about failing pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror. All links point to gitlab2.suse.de which appears to be inaccessible or down, which is why I couldn't check what's going on there.
Pipelines on gitlab.suse.de seem to be fine.
Acceptance criteria¶
- AC1: No waves of alert emails for an inaccessible gitlab instance
Suggestion¶
- Status changed from New to Blocked
- Assignee set to livdywan
- Status changed from Blocked to Feedback
No more emails should be sent from what appears to be a staging instance, as I understand it. Also being discussed in Slack.
Just to be clear I added a comment asking that neither emails nor production code from pipelines is executed there, since I couldn't confirm what the pipelines were doing.
- Status changed from Feedback to Resolved
cdywan wrote:
No more emails should be sent from what appears to be a staging instance, as I understand it. Also being discussed in Slack.
Just to be clear I added a comment asking that neither emails nor production code from pipelines is executed there, since I couldn't confirm what the pipelines were doing.
Sure, sounds good. Let's consider this resolved as Moroni stated they really did this as as one-time accident.
- Subject changed from [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab2 to [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance
- Status changed from Resolved to In Progress
I'm re-opening because we're now getting failing alerts from gitlab-stage.suse.de.
As a temporal workaround, I logged in to the staging instance and manually disabled schedules in our projects (those, which we saw in the mailbox). Hopefully, this should lower the amount of false-positive e-mails in our mailing list. At least until someone does a new DB synchronization which will override the settings.
Please also note, there is still around 150 jobs waiting in the queue for bot-ng (jobs are scheduled quite often for that project), so we still might get some e-mails; for this project, I disabled the CI and notification e-mails completely, which might help a little.
- Subject changed from [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance to [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance size:S
- Description updated (diff)
- Status changed from In Progress to Feedback
- Due date set to 2023-02-03
- Status changed from Feedback to Blocked
Waiting to get a response for now, using the due date as a reminder.
- Due date changed from 2023-02-03 to 2023-02-10
Bumping due date due to hackweek.
Still waiting on clarification concerning SD-109568. We need to make sure our tickets are being taken seriously.
- Related to action #121816: Cannot access installation media on updates.suse.com - maintenance tests broken size:S added
- Due date changed from 2023-02-10 to 2023-02-24
- Status changed from Blocked to Resolved
I talked about this last week but didn't save the comment here. I'm concluding the ticket after having had a feedback session with Vit, Moroni and Matthias. We looked at this and other recent tickets to compare perspectives by example. The particular case was rooted in an upstream GitLab issue found after an upgrade which had a bigger impact than one would expect. We also discussed how limited capacity contributes to delays. This ticket was also seen as low impact, which I can relate to even if it was painful for us to deal with.
One take away is that we should try and clarify the impact on any ticket, and focus on the most critical issues. Conversely we can ask for clarification in terms of when and if someone can currently help us.
Also it is worth making sure we're talking to the right person. We (the Tools team) can't see it in tickets but assignments can be left open or put wrongly. It is fine to ask "Is the right person able to help here", or "When can we realistically get this done" or even asking ourselves if we can find alternative approaches.
Reviving close collaboration with Matthias' team was also suggested.
Also available in: Atom
PDF