Project

General

Profile

Actions

action #97244

closed

openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M

Added by okurz over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2021-08-19
Due date:
2021-09-17
% Done:

0%

Estimated time:

Description

Motivation

See http://mailman.suse.de/mlarch/SuSE/osd-admins/2021/osd-admins.2021.08/msg00490.html

Acceptance criteria

Suggestion

  • Look into the pipeline why it failed to create tickets
  • Try out to create tickets in a new way automatically how EngInfra likes it (potentially ask them)
    • we might need to send from @suse.com address and we would need to create a new dedicated account for that. We should be able to just include osd-admins@suse.de in CC.
    • Use the JIRA-SD API to create new tickets. This would allow us to better control if and what ticket was created and fail accordingly if the system breaks for whatever reason so manual investigation is possible and can be easily spotted
    • If we can not find an easy solution with EngInfra we escalate to runger@suse.com because he already offered we should bring topics up to him which he can discuss with EngInfra team lead

Out of scope

Fixing IPMI based recovery


Related issues 3 (0 open3 closed)

Related to openQA Infrastructure (public) - action #97364: openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:SResolvednicksinger2021-08-23

Actions
Related to openQA Infrastructure (public) - action #97382: ARM automatic reboot pipeline does not fail if ipmitool fails size:SResolveddheidler2021-08-23

Actions
Related to openQA Infrastructure (public) - action #97502: osd deployment failed due to openqaworker-arm-3 being down, needs to be worked around size:MResolvedokurz2021-08-25

Actions
Actions #1

Updated by okurz over 3 years ago

  • Related to action #97364: openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:S added
Actions #2

Updated by okurz over 3 years ago

  • Subject changed from openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets to openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by nicksinger over 3 years ago

I researched the JIRA API a little bit: https://docs.atlassian.com/jira-servicedesk/REST/3.6.2/
It should be possible to use it. However we need an "Application" for this in JIRA (for authentication). This can be archived by writing to jira-admins@suse.de (https://chat.suse.de/channel/jira?msg=mur2imxxCLNEtxSPL). I didn't do this but wanted to share some first steps

Actions #4

Updated by dheidler over 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to dheidler
Actions #5

Updated by dheidler over 3 years ago

Wrote to jira-admins requesting application access.

Actions #6

Updated by openqa_review over 3 years ago

  • Due date set to 2021-09-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by dheidler over 3 years ago

  • Status changed from In Progress to Feedback

Waiting for a response from jira admins

Actions #8

Updated by okurz over 3 years ago

  • Subject changed from openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets size:M to openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M
Actions #9

Updated by dheidler over 3 years ago

  • Related to action #97382: ARM automatic reboot pipeline does not fail if ipmitool fails size:S added
Actions #10

Updated by ilausuch over 3 years ago

When the worker will be up again, this should be added to production. See #97502

Actions #11

Updated by dheidler over 3 years ago

ilausuch wrote:

When the worker will be up again, this should be added to production. See #97502

Not sure what you mean but this ticket here is only about creating tickets automatically.

Actions #12

Updated by okurz over 3 years ago

  • Related to action #97502: osd deployment failed due to openqaworker-arm-3 being down, needs to be worked around size:M added
Actions #13

Updated by ilausuch over 3 years ago

dheidler wrote:

ilausuch wrote:

When the worker will be up again, this should be added to production. See #97502

Not sure what you mean but this ticket here is only about creating tickets automatically.

Yes, you are right. This is not the correct ticket.

Actions #14

Updated by nicksinger over 3 years ago

  • Description updated (diff)
Actions #15

Updated by okurz over 3 years ago

nicksinger wrote:

I researched the JIRA API a little bit: https://docs.atlassian.com/jira-servicedesk/REST/3.6.2/

Please don't overdo it. We should be able to rely on just ticket creation by email. If somebody denies us that possibility please escalate to runger.

Actions #16

Updated by dheidler over 3 years ago

Ok - As I didn't get any response from jira-admins yet, I'll send a mail to Ralf.

Actions #17

Updated by dheidler over 3 years ago

Meanwhile there was a less helpful response to my mail to jira-admins:

Hello Dominik,
due to the link you have provided you want to create tickts in
https://sd.suse.com. So, please get in contact with SUSE-IT or Engefra
by following paragraph 1.) or 2.) from your link.

Best,
Robert

On Tue, 2021-08-24 at 11:48 +0200, Dominik Heidler wrote:
> Hi,
>
> due to
> https://mailman.suse.de/mlarch/SuSE/osd-admins/2021/osd-admins.2021.08/msg00490.html
> the qa-tools team needs an "application" (API user) to be able to
> create tickets using the jira-servicedesk API.
> The E-Mail address (if applicable) would be osd-admins@suse.de.
>
> Regards,
> Dominik
Actions #18

Updated by dheidler over 3 years ago

Reading between the lines of the latest mail I suspect that they want to say that jira-admins don't manage the jira-servicedesk application.
So I opened an infra ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-59028

Actions #19

Updated by dheidler over 3 years ago

... which got closed with the comment

Please send your tickets to address enginfra-system@suse.com

Which I tried with a test ticket and which (as expected) doesn't seem to work (as stated in that mail from infra).

So I opened a new ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-59046

Actions #20

Updated by livdywan over 3 years ago

  • Due date changed from 2021-09-08 to 2021-09-10

@dheidler Did you look into the suggestion from Evženie from yesterday? That is, using enginfra-system@suse.com

Actions #21

Updated by dheidler over 3 years ago

I wanted to talk about this in the team but I can as well write it here:

To make sure that all team members have access to the tickets,
we would need to send them to that email address and add an info for the L1 team.
Something like this:
"Please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket".
I'm not sure if they will do it, though.

WDYT?

Actions #22

Updated by okurz over 3 years ago

dheidler wrote:

"Please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket".

I suggest the following:

  1. Write the message: "Please add osd-admins@suse.de as CC (If this is not possible, please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket)".
  2. When sending an email to create a ticket automatically please CC "infra@suse.de" from which we actually do get a confirmation as long as that system still exists.
Actions #24

Updated by dheidler about 3 years ago

  • Due date changed from 2021-09-10 to 2021-09-17
Actions #25

Updated by dheidler about 3 years ago

  • Status changed from Feedback to Resolved

This should be covered now:

Hi Evženie,

Let me try to summarize things to avoid that we do something wrong..

You want me to create two accounts in Jira SD

  1. osd-admins@suse.de
  2. eng-infra@suse.de

and then an automation rule who checks for "Please add osd-admins@suse.de as CC" in the message body to add osd-admins@suse.de as a participant and add eng-infra@suse.de as a participant to any other Eng-Infra related Ticket.

[…]

Regards,
Ömer

Actions #26

Updated by okurz about 3 years ago

dheidler wrote:

This should be covered now:

Sounds great! Does it work?

Actions #27

Updated by dheidler about 3 years ago

I did some manual test sending mails from my own suse.de address with Ömer and that worked fine.

Actions #28

Updated by livdywan about 3 years ago

  • Status changed from Resolved to Feedback

dheidler wrote:

I did some manual test sending mails from my own suse.de address with Ömer and that worked fine.

Discussed briefly in the weekly. It seems like the pipeline should've triggered and we should've seen an email for arm3. But nobody could confirm getting emails for it.

Actions #29

Updated by okurz about 3 years ago

  • Status changed from Feedback to Resolved

Hm, we do not automatically retrigger a CI pipeline after the initial one. openqaworker-arm-3 is still down. So I triggered https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/589516 with MACHINE=openqaworker-arm-3. Now IPMI could be reached and the machine was successfully recovered. This means we could not actually verify that email sending works but the machine is up and the next time the problem would happen we could actually see if it works then.

Actions

Also available in: Atom PDF