action #97244
closedopenqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M
0%
Description
Motivation¶
See http://mailman.suse.de/mlarch/SuSE/osd-admins/2021/osd-admins.2021.08/msg00490.html
Acceptance criteria¶
- AC1: gitlab CI in https://gitlab.suse.de/openqa/grafana-webhook-actions can automatically create tickets again if automatic recovery of openqaworker-arm-[123] failed
Suggestion¶
- Look into the pipeline why it failed to create tickets
- Try out to create tickets in a new way automatically how EngInfra likes it (potentially ask them)
- we might need to send from @suse.com address and we would need to create a new dedicated account for that. We should be able to just include osd-admins@suse.de in CC.
- Use the JIRA-SD API to create new tickets. This would allow us to better control if and what ticket was created and fail accordingly if the system breaks for whatever reason so manual investigation is possible and can be easily spotted
- If we can not find an easy solution with EngInfra we escalate to runger@suse.com because he already offered we should bring topics up to him which he can discuss with EngInfra team lead
Out of scope¶
Fixing IPMI based recovery
Updated by okurz over 3 years ago
- Related to action #97364: openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:S added
Updated by okurz over 3 years ago
- Subject changed from openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets to openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by nicksinger over 3 years ago
I researched the JIRA API a little bit: https://docs.atlassian.com/jira-servicedesk/REST/3.6.2/
It should be possible to use it. However we need an "Application" for this in JIRA (for authentication). This can be archived by writing to jira-admins@suse.de (https://chat.suse.de/channel/jira?msg=mur2imxxCLNEtxSPL). I didn't do this but wanted to share some first steps
Updated by dheidler over 3 years ago
- Status changed from Workable to In Progress
- Assignee set to dheidler
Updated by dheidler over 3 years ago
Wrote to jira-admins requesting application access.
Updated by openqa_review over 3 years ago
- Due date set to 2021-09-08
Setting due date based on mean cycle time of SUSE QE Tools
Updated by dheidler over 3 years ago
- Status changed from In Progress to Feedback
Waiting for a response from jira admins
Updated by okurz over 3 years ago
- Subject changed from openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets size:M to openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M
Updated by dheidler over 3 years ago
- Related to action #97382: ARM automatic reboot pipeline does not fail if ipmitool fails size:S added
Updated by ilausuch over 3 years ago
When the worker will be up again, this should be added to production. See #97502
Updated by dheidler over 3 years ago
ilausuch wrote:
When the worker will be up again, this should be added to production. See #97502
Not sure what you mean but this ticket here is only about creating tickets automatically.
Updated by okurz over 3 years ago
- Related to action #97502: osd deployment failed due to openqaworker-arm-3 being down, needs to be worked around size:M added
Updated by ilausuch over 3 years ago
dheidler wrote:
ilausuch wrote:
When the worker will be up again, this should be added to production. See #97502
Not sure what you mean but this ticket here is only about creating tickets automatically.
Yes, you are right. This is not the correct ticket.
Updated by okurz over 3 years ago
nicksinger wrote:
I researched the JIRA API a little bit: https://docs.atlassian.com/jira-servicedesk/REST/3.6.2/
Please don't overdo it. We should be able to rely on just ticket creation by email. If somebody denies us that possibility please escalate to runger.
Updated by dheidler over 3 years ago
Ok - As I didn't get any response from jira-admins yet, I'll send a mail to Ralf.
Updated by dheidler over 3 years ago
Meanwhile there was a less helpful response to my mail to jira-admins:
Hello Dominik,
due to the link you have provided you want to create tickts in
https://sd.suse.com. So, please get in contact with SUSE-IT or Engefra
by following paragraph 1.) or 2.) from your link.
Best,
Robert
On Tue, 2021-08-24 at 11:48 +0200, Dominik Heidler wrote:
> Hi,
>
> due to
> https://mailman.suse.de/mlarch/SuSE/osd-admins/2021/osd-admins.2021.08/msg00490.html
> the qa-tools team needs an "application" (API user) to be able to
> create tickets using the jira-servicedesk API.
> The E-Mail address (if applicable) would be osd-admins@suse.de.
>
> Regards,
> Dominik
Updated by dheidler over 3 years ago
Reading between the lines of the latest mail I suspect that they want to say that jira-admins don't manage the jira-servicedesk application.
So I opened an infra ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-59028
Updated by dheidler over 3 years ago
... which got closed with the comment
Please send your tickets to address enginfra-system@suse.com
Which I tried with a test ticket and which (as expected) doesn't seem to work (as stated in that mail from infra).
So I opened a new ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-59046
Updated by livdywan over 3 years ago
- Due date changed from 2021-09-08 to 2021-09-10
@dheidler Did you look into the suggestion from Evženie from yesterday? That is, using enginfra-system@suse.com
Updated by dheidler over 3 years ago
I wanted to talk about this in the team but I can as well write it here:
To make sure that all team members have access to the tickets,
we would need to send them to that email address and add an info for the L1 team.
Something like this:
"Please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket".
I'm not sure if they will do it, though.
WDYT?
Updated by okurz over 3 years ago
dheidler wrote:
"Please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket".
I suggest the following:
- Write the message: "Please add osd-admins@suse.de as CC (If this is not possible, please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket)".
- When sending an email to create a ticket automatically please CC "infra@suse.de" from which we actually do get a confirmation as long as that system still exists.
Updated by dheidler over 3 years ago
https://gitlab.suse.de/openqa/grafana-webhook-actions/-/merge_requests/18
I'm not so sure about the CC, though.
Updated by dheidler about 3 years ago
- Due date changed from 2021-09-10 to 2021-09-17
Updated by dheidler about 3 years ago
- Status changed from Feedback to Resolved
This should be covered now:
Hi Evženie,
Let me try to summarize things to avoid that we do something wrong..
You want me to create two accounts in Jira SD
and then an automation rule who checks for "Please add osd-admins@suse.de as CC" in the message body to add osd-admins@suse.de as a participant and add eng-infra@suse.de as a participant to any other Eng-Infra related Ticket.
[…]
Regards,
Ömer
Updated by okurz about 3 years ago
dheidler wrote:
This should be covered now:
Sounds great! Does it work?
Updated by dheidler about 3 years ago
I did some manual test sending mails from my own suse.de address with Ömer and that worked fine.
Updated by livdywan about 3 years ago
- Status changed from Resolved to Feedback
dheidler wrote:
I did some manual test sending mails from my own suse.de address with Ömer and that worked fine.
Discussed briefly in the weekly. It seems like the pipeline should've triggered and we should've seen an email for arm3. But nobody could confirm getting emails for it.
Updated by okurz about 3 years ago
- Status changed from Feedback to Resolved
Hm, we do not automatically retrigger a CI pipeline after the initial one. openqaworker-arm-3 is still down. So I triggered https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/589516 with MACHINE=openqaworker-arm-3
. Now IPMI could be reached and the machine was successfully recovered. This means we could not actually verify that email sending works but the machine is up and the next time the problem would happen we could actually see if it works then.