Project

General

Profile

action #77089

[osd][retrospective] multiple unattended alerts, unattended gitlab CI pipeline fails, all osd aarch64 workers offline

Added by okurz 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2020-11-07
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Observation

On 2020-11-06 Found multiple unattended alerts, unattended gitlab CI pipeline fails, all osd aarch64 workers offline. What happened?

What I have seen failing:

Acceptance criteria

  • AC1: Alerts handled
  • AC2: gitlab CI jobs can find shared runners again
  • AC3: issue has been discussed with team, e.g. in retrospective

Subtasks

action #77101: fix selection of gitlab CI runnersResolvedokurz

History

#1 Updated by okurz 2 months ago

  • Description updated (diff)
  • Status changed from New to Workable
  • Priority changed from High to Urgent
  • Target version set to Ready

To have at least one aarch64 worker I did ipmi-openqaworker-arm-1-ipmi power reset now.

Likely all gitlab CI pipelines fail after the shared gitlab CI runners have been changed. I added a comment on https://gitlab.suse.de/openqa/openqa-trigger-from-ibs-plugin/-/compare/94e15aeec52be49db86969aece69b4efd358d632...c525232ebbb658a21e0b4bb3303f432f520912ed regarding this now.

#2 Updated by okurz 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

missing gitlab CI runners handled in #77101

#3 Updated by okurz 2 months ago

  • Description updated (diff)
  • Status changed from In Progress to Feedback

Fixed openqaworker-arm-1 and openqaworker-arm-2, openqaworker-arm-3 can not be controlled over IPMI again, see comment in #76876#note-5, alerts for openqaworker-arm-1 and openqaworker-arm-2 are back green.

minion jobs on osd reviewed, most are obs_sync ones that we already have a ticket for: #70768, commented there and increased priority.

#4 Updated by okurz 2 months ago

  • Status changed from Feedback to Resolved

Acceptance criteria

  • AC1: Alerts handled

DONE

  • AC2: gitlab CI jobs can find shared runners again

DONE, see #77101

  • AC3: issue has been discussed with team, e.g. in retrospective

DONE, discussed in https://www.retrospected.com/game/kj6hv59UD with cdywan at least already, came up with suggestion #77317

Also available in: Atom PDF