Project

General

Profile

action #97244

openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M

Added by okurz 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
2021-08-19
Due date:
2021-09-17
% Done:

0%

Estimated time:

Description

Motivation

See http://mailman.suse.de/mlarch/SuSE/osd-admins/2021/osd-admins.2021.08/msg00490.html

Acceptance criteria

Suggestion

  • Look into the pipeline why it failed to create tickets
  • Try out to create tickets in a new way automatically how EngInfra likes it (potentially ask them)
    • we might need to send from @suse.com address and we would need to create a new dedicated account for that. We should be able to just include osd-admins@suse.de in CC.
    • Use the JIRA-SD API to create new tickets. This would allow us to better control if and what ticket was created and fail accordingly if the system breaks for whatever reason so manual investigation is possible and can be easily spotted
    • If we can not find an easy solution with EngInfra we escalate to runger@suse.com because he already offered we should bring topics up to him which he can discuss with EngInfra team lead

Out of scope

Fixing IPMI based recovery


Related issues

Related to openQA Infrastructure - action #97364: openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:SResolved2021-08-23

Related to openQA Infrastructure - action #97382: ARM automatic reboot pipeline does not fail if ipmitool fails size:SResolved2021-08-23

Related to openQA Infrastructure - action #97502: osd deployment failed due to openqaworker-arm-3 being down, needs to be worked around size:MResolved2021-08-25

History

#1 Updated by okurz 2 months ago

  • Related to action #97364: openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:S added

#2 Updated by okurz 2 months ago

  • Subject changed from openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets to openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets size:M
  • Description updated (diff)
  • Status changed from New to Workable

#3 Updated by nicksinger 2 months ago

I researched the JIRA API a little bit: https://docs.atlassian.com/jira-servicedesk/REST/3.6.2/
It should be possible to use it. However we need an "Application" for this in JIRA (for authentication). This can be archived by writing to jira-admins@suse.de (https://chat.suse.de/channel/jira?msg=mur2imxxCLNEtxSPL). I didn't do this but wanted to share some first steps

#4 Updated by dheidler 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to dheidler

#5 Updated by dheidler 2 months ago

Wrote to jira-admins requesting application access.

#6 Updated by openqa_review 2 months ago

  • Due date set to 2021-09-08

Setting due date based on mean cycle time of SUSE QE Tools

#7 Updated by dheidler 2 months ago

  • Status changed from In Progress to Feedback

Waiting for a response from jira admins

#8 Updated by okurz 2 months ago

  • Subject changed from openqaworker-arm-3 is offline and EngInfra wants to make our lives miserable by forcing us to create JiraSD tickets size:M to openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M

#9 Updated by dheidler 2 months ago

  • Related to action #97382: ARM automatic reboot pipeline does not fail if ipmitool fails size:S added

#10 Updated by ilausuch 2 months ago

When the worker will be up again, this should be added to production. See #97502

#11 Updated by dheidler 2 months ago

ilausuch wrote:

When the worker will be up again, this should be added to production. See #97502

Not sure what you mean but this ticket here is only about creating tickets automatically.

#12 Updated by okurz 2 months ago

  • Related to action #97502: osd deployment failed due to openqaworker-arm-3 being down, needs to be worked around size:M added

#13 Updated by ilausuch about 2 months ago

dheidler wrote:

ilausuch wrote:

When the worker will be up again, this should be added to production. See #97502

Not sure what you mean but this ticket here is only about creating tickets automatically.

Yes, you are right. This is not the correct ticket.

#14 Updated by nicksinger about 2 months ago

  • Description updated (diff)

#15 Updated by okurz about 2 months ago

nicksinger wrote:

I researched the JIRA API a little bit: https://docs.atlassian.com/jira-servicedesk/REST/3.6.2/

Please don't overdo it. We should be able to rely on just ticket creation by email. If somebody denies us that possibility please escalate to runger.

#16 Updated by dheidler about 2 months ago

Ok - As I didn't get any response from jira-admins yet, I'll send a mail to Ralf.

#17 Updated by dheidler about 2 months ago

Meanwhile there was a less helpful response to my mail to jira-admins:

Hello Dominik,
due to the link you have provided you want to create tickts in
https://sd.suse.com. So, please get in contact with SUSE-IT or Engefra
by following paragraph 1.) or 2.) from your link.

Best,
Robert

On Tue, 2021-08-24 at 11:48 +0200, Dominik Heidler wrote:
> Hi,
>
> due to
> https://mailman.suse.de/mlarch/SuSE/osd-admins/2021/osd-admins.2021.08/msg00490.html
> the qa-tools team needs an "application" (API user) to be able to
> create tickets using the jira-servicedesk API.
> The E-Mail address (if applicable) would be osd-admins@suse.de.
>
> Regards,
> Dominik

#18 Updated by dheidler about 2 months ago

Reading between the lines of the latest mail I suspect that they want to say that jira-admins don't manage the jira-servicedesk application.
So I opened an infra ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-59028

#19 Updated by dheidler about 2 months ago

... which got closed with the comment

Please send your tickets to address enginfra-system@suse.com

Which I tried with a test ticket and which (as expected) doesn't seem to work (as stated in that mail from infra).

So I opened a new ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-59046

#20 Updated by cdywan about 2 months ago

  • Due date changed from 2021-09-08 to 2021-09-10

dheidler Did you look into the suggestion from Evženie from yesterday? That is, using enginfra-system@suse.com

#21 Updated by dheidler about 2 months ago

I wanted to talk about this in the team but I can as well write it here:

To make sure that all team members have access to the tickets,
we would need to send them to that email address and add an info for the L1 team.
Something like this:
"Please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket".
I'm not sure if they will do it, though.

WDYT?

#22 Updated by okurz about 1 month ago

dheidler wrote:

"Please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket".

I suggest the following:

  1. Write the message: "Please add osd-admins@suse.de as CC (If this is not possible, please add all team members listed at https://progress.opensuse.org/projects/qa/wiki/Wiki#Team to this ticket)".
  2. When sending an email to create a ticket automatically please CC "infra@suse.de" from which we actually do get a confirmation as long as that system still exists.

#24 Updated by dheidler about 1 month ago

  • Due date changed from 2021-09-10 to 2021-09-17

#25 Updated by dheidler about 1 month ago

  • Status changed from Feedback to Resolved

This should be covered now:

Hi Evženie,

Let me try to summarize things to avoid that we do something wrong..

You want me to create two accounts in Jira SD

  1. osd-admins@suse.de
  2. eng-infra@suse.de

and then an automation rule who checks for "Please add osd-admins@suse.de as CC" in the message body to add osd-admins@suse.de as a participant and add eng-infra@suse.de as a participant to any other Eng-Infra related Ticket.

[…]

Regards,
Ömer

#26 Updated by okurz about 1 month ago

dheidler wrote:

This should be covered now:

Sounds great! Does it work?

#27 Updated by dheidler about 1 month ago

I did some manual test sending mails from my own suse.de address with Ömer and that worked fine.

#28 Updated by cdywan about 1 month ago

  • Status changed from Resolved to Feedback

dheidler wrote:

I did some manual test sending mails from my own suse.de address with Ömer and that worked fine.

Discussed briefly in the weekly. It seems like the pipeline should've triggered and we should've seen an email for arm3. But nobody could confirm getting emails for it.

#29 Updated by okurz about 1 month ago

  • Status changed from Feedback to Resolved

Hm, we do not automatically retrigger a CI pipeline after the initial one. openqaworker-arm-3 is still down. So I triggered https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/589516 with MACHINE=openqaworker-arm-3. Now IPMI could be reached and the machine was successfully recovered. This means we could not actually verify that email sending works but the machine is up and the next time the problem would happen we could actually see if it works then.

Also available in: Atom PDF