Actions
action #107671
closedNo aggregate maintenance runs scheduled today on osd size:M
Start date:
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Seems a different issue than #106179 since the dashboard is accessible this time.
Link to list aggregate runs of the day:
Impact: update approval blocked
Suggestions¶
- caused by downtime of http://download.suse.de
- read suggestions from #105603
- Some gitlab CI steps are failing but we allow them to fail to let other steps continue, e.g. in https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/886067 "sync smelt" fails but we allow it to fail so that "sync incidents" can continue but we also don't receive an alert about it and there is not sufficient retrying. We could split the steps into separate pipelines, make each step fatal and add configurable number of retries and interval between retries customized for each step in https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/.gitlab-ci.yml, e.g. for sync smelt long enough , retrying to cover the weekly SUSE IT maintenance window, less for other critical steps
- For retrying we do not even need to change qem-bot, we could use just a wrapper in the gitlab CI job itself, e.g. https://github.com/okurz/leaky_bucket_error_count
- Also look into gitlab CI options to either abort a previous pipeline if a new one is triggered or not start new ones as long as old ones are still running
Actions