Project

General

Profile

Actions

action #137231

closed

[tools] qem-bot and others can not execute scheduled jobs due to registry.opensuse.org outage

Added by okurz 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-09-29
Due date:
% Done:

0%

Estimated time:

Description

Observation

From https://suse.slack.com/archives/C02D16TCP99/p1695966428720309

(Timo Jyrinki) Did we get enough updates published on Wednesday, or should someone be giving a little nudge to openQA to get today's tests running, as there has been nothing running for now?
(Oliver Kurz) The reason is https://suse.slack.com/archives/C02B5UVEC94/p1695921343628859 . Do you have an idea how to work around that? One could run qem-bot from another environment but I don't plan us to do that.
(Lars Vogdt) :alert: Our Container registry registry.opensuse.org is currently down, after a reboot to get to the current kernel, it got stuck in xfs ... :alert:
:eyes_right: at the moment we are trying to run xfs_repair, but get a a very low disc speed from the underlying storage device. So please expect the system to be unavailable for the next hours. :eyes:
Posted in engineering | Yesterday at 19:15 | View message
(Timo Jyrinki) Oh, ok, thank you for the information. I do not know how to operate the qem-bot, but maybe someone else knows? Especially if there's something urgent in the update queue @Heiko Rommel
(Oliver Kurz) it's not too hard to call locally, see https://github.com/openSUSE/qem-bot/#usage . However it seems registry.opensuse.org is up by now again. So I retriggered the call for aggregates: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1865825, currently running

Problem

gitlab CI jobs rely on a container registry which was down

Actions #1

Updated by okurz 7 months ago

I retriggered the call for aggregates: https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1865825, currently running

Actions #2

Updated by okurz 7 months ago

The last cycle of openQA aggregate tests was now retriggered in https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1865825 and tests show up now in https://openqa.suse.de/tests/overview?build=20230929-1&distri=sle and other places

Actions #3

Updated by okurz 7 months ago

  • Subject changed from qem-bot and others can not execute scheduled jobs due to registry.opensuse.org outage to [tools] qem-bot and others can not execute scheduled jobs due to registry.opensuse.org outage
  • Target version set to Ready
Actions #4

Updated by openqa_review 7 months ago

  • Due date set to 2023-10-14

Setting due date based on mean cycle time of SUSE QE Tools

Actions #5

Updated by okurz 7 months ago

  • Due date deleted (2023-10-14)
  • Status changed from In Progress to Resolved

I checked the job schedule and execution of bot-ng and ensured that we have pipeline status emails again enabled properly in all gitlab projects following https://progress.opensuse.org/projects/qa/wiki/Tools#API-usage-for-handling-email-notification

Actions

Also available in: Atom PDF