Project

General

Profile

Actions

action #107671

closed

No aggregate maintenance runs scheduled today on osd size:M

Added by mgrifalconi almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

Seems a different issue than #106179 since the dashboard is accessible this time.

Link to list aggregate runs of the day:

https://openqa.suse.de/tests/overview?arch=&flavor=&machine=&test=&modules=&module_re=&groupid=366&groupid=308&groupid=232&groupid=165&groupid=280&groupid=218&groupid=108&groupid=54&groupid=405&groupid=412&groupid=411&groupid=369&groupid=352&groupid=353&groupid=357&groupid=355&groupid=354&groupid=358&groupid=370&groupid=348&groupid=349&groupid=351&groupid=356&groupid=375&groupid=376&groupid=397&groupid=414&build=20220228-1#
(This was showing an empty list at that point)

Impact: update approval blocked

Suggestions

  • caused by downtime of http://download.suse.de
  • read suggestions from #105603
  • Some gitlab CI steps are failing but we allow them to fail to let other steps continue, e.g. in https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/886067 "sync smelt" fails but we allow it to fail so that "sync incidents" can continue but we also don't receive an alert about it and there is not sufficient retrying. We could split the steps into separate pipelines, make each step fatal and add configurable number of retries and interval between retries customized for each step in https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/.gitlab-ci.yml, e.g. for sync smelt long enough , retrying to cover the weekly SUSE IT maintenance window, less for other critical steps
  • For retrying we do not even need to change qem-bot, we could use just a wrapper in the gitlab CI job itself, e.g. https://github.com/okurz/leaky_bucket_error_count
  • Also look into gitlab CI options to either abort a previous pipeline if a new one is triggered or not start new ones as long as old ones are still running

Related issues 3 (0 open3 closed)

Related to QA (public) - action #106179: No aggregate maintenance runs scheduled today on osd - dashboard.qem.suse.de down size:SResolvedosukup2022-02-08

Actions
Related to openQA Infrastructure (public) - action #105603: openQABot pipeline failed: "ERROR:root:Something bad happended during reading MR data from SMELT/IBS: Expecting value: line 4 column 1 (char 3)" size:MResolvedjbaier_cz2021-12-16

Actions
Related to openQA Project (public) - action #108824: Some of the daily aggregate tests are cancelled without a reason size:MResolvedokurz2022-03-24

Actions
Actions

Also available in: Atom PDF