action #123232
closed[Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance size:S
0%
Description
Observation¶
I received a lot of emails about failing pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror. All links point to gitlab2.suse.de which appears to be inaccessible or down, which is why I couldn't check what's going on there.
Pipelines on gitlab.suse.de seem to be fine.
Acceptance criteria¶
- AC1: No waves of alert emails for an inaccessible gitlab instance
Suggestion¶
- Find out how to disable notifications from gitlab2.suse.de
- Reach out to SUSE IT for help
- See opened SD ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-109568
Updated by livdywan almost 2 years ago
- Status changed from New to Blocked
- Assignee set to livdywan
Updated by livdywan almost 2 years ago
- Status changed from Blocked to Feedback
No more emails should be sent from what appears to be a staging instance, as I understand it. Also being discussed in Slack.
Just to be clear I added a comment asking that neither emails nor production code from pipelines is executed there, since I couldn't confirm what the pipelines were doing.
Updated by okurz almost 2 years ago
- Status changed from Feedback to Resolved
cdywan wrote:
No more emails should be sent from what appears to be a staging instance, as I understand it. Also being discussed in Slack.
Just to be clear I added a comment asking that neither emails nor production code from pipelines is executed there, since I couldn't confirm what the pipelines were doing.
Sure, sounds good. Let's consider this resolved as Moroni stated they really did this as as one-time accident.
Updated by livdywan almost 2 years ago
- Subject changed from [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab2 to [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance
- Status changed from Resolved to In Progress
I'm re-opening because we're now getting failing alerts from gitlab-stage.suse.de.
Updated by jbaier_cz almost 2 years ago
As a temporal workaround, I logged in to the staging instance and manually disabled schedules in our projects (those, which we saw in the mailbox). Hopefully, this should lower the amount of false-positive e-mails in our mailing list. At least until someone does a new DB synchronization which will override the settings.
Updated by jbaier_cz almost 2 years ago
Please also note, there is still around 150 jobs waiting in the queue for bot-ng (jobs are scheduled quite often for that project), so we still might get some e-mails; for this project, I disabled the CI and notification e-mails completely, which might help a little.
Updated by mkittler almost 2 years ago
- Subject changed from [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance to [Alerting] failed pipelines for openQABot/ bot-ng/ os-autoinst-needles-opensuse-mirror on gitlab staging instance size:S
- Description updated (diff)
- Status changed from In Progress to Feedback
Updated by livdywan almost 2 years ago
- Due date set to 2023-02-03
- Status changed from Feedback to Blocked
Waiting to get a response for now, using the due date as a reminder.
Updated by livdywan almost 2 years ago
- Due date changed from 2023-02-03 to 2023-02-10
Bumping due date due to hackweek.
Updated by livdywan almost 2 years ago
Still waiting on clarification concerning SD-109568. We need to make sure our tickets are being taken seriously.
Updated by okurz almost 2 years ago
- Related to action #121816: Cannot access installation media on updates.suse.com - maintenance tests broken size:S added
Updated by okurz almost 2 years ago
- Due date changed from 2023-02-10 to 2023-02-24
There was no response in https://sd.suse.com/servicedesk/customer/portal/1/SD-109568 but an escalation follow-up is conducted. cdywan will CC osd-admins@suse.de or something where others from the team should be able to follow up. Now related to #121816, not regarding the original issue but the considerations regarding process and process improvements.
Updated by livdywan almost 2 years ago
- Status changed from Blocked to Resolved
I talked about this last week but didn't save the comment here. I'm concluding the ticket after having had a feedback session with Vit, Moroni and Matthias. We looked at this and other recent tickets to compare perspectives by example. The particular case was rooted in an upstream GitLab issue found after an upgrade which had a bigger impact than one would expect. We also discussed how limited capacity contributes to delays. This ticket was also seen as low impact, which I can relate to even if it was painful for us to deal with.
One take away is that we should try and clarify the impact on any ticket, and focus on the most critical issues. Conversely we can ask for clarification in terms of when and if someone can currently help us.
Also it is worth making sure we're talking to the right person. We (the Tools team) can't see it in tickets but assignments can be left open or put wrongly. It is fine to ask "Is the right person able to help here", or "When can we realistically get this done" or even asking ourselves if we can find alternative approaches.
Reviving close collaboration with Matthias' team was also suggested.