action #108869
closedcoordination #91646: [saga][epic] SUSE Maintenance QA workflows with fully automated testing, approval and release
Missing (re-)schedules of SLE maintenance tests size:M
0%
Description
Motivation¶
See https://suse.slack.com/archives/C02D16TCP99/p1648110330160679
From Jozef Pupava
There are updates with no aggregates or record in http://dashboard.qam.suse.de/blocked S:M:23303:267916 S:M:23302:267917 S:M:23311:267930 S:M:23085:267929 ...
http://dashboard.qam.suse.de/incident/23302
http://dashboard.qam.suse.de/incident/23303
http://dashboard.qam.suse.de/incident/23311
http://dashboard.qam.suse.de/incident/23085 (edited)There is some issue, bot is not running jobs on (I guess) resubmited updates ?
e.g. samba
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP2&build=%3A23309%3Asamba&groupid=310
What is the staged status ?
http://dashboard.qam.suse.de/incident/23309chrony update was rejected by @Paolo Stivanin but looking on the HA test the update is not added there so the test didn't fail due to regression https://bugzilla.suse.com/show_bug.cgi?id=1194220#c32
https://openqa.suse.de/tests/8379382#settings
Below @Paolo Stivanin mentioned kernel S:M:23280:268126 (edited)Another case where aggregates are failing because there is update which was released 2 days ago!
https://openqa.suse.de/tests/8380595#step/zypper_ref/3Today's run is worthless, does not contain new updates and is running with released updates, repos are deleted
I guess same for yesterday and maybe even days before. (edited)
Acceptance criteria¶
- AC1: It is known what existing workflows require without needing any new features (Existing workflows to schedule incident and aggregate tests are ok again)
- AC2: Potential new feature requests have been identified and documented in new tickets
Suggestions¶
- Could be related to, or a regression from #103701 / https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/46
- Talk to jpupava, e.g. in the Slack discussion mentioned above
- Try to find out what is actually broken
- Try to separate regressions from new feature requests which should go into separate tickets
- Try to separate "something is missing" cases from "something is failing" cases
Updated by okurz over 2 years ago
- Related to action #103701: Resubmited incident (ID) with new release request (RR) inherits incident test results from previous RR added
Updated by livdywan over 2 years ago
- Subject changed from Missing (re-)schedules of SLE maintenance tests to Missing (re-)schedules of SLE maintenance tests size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by jbaier_cz over 2 years ago
At least the "missing" part should be solved, it was related to https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/50 after all. There was a leftover --dry
parameter in the smelt-sync job.
Updated by osukup over 2 years ago
- Status changed from Workable to Resolved
- Assignee set to osukup
missed --dry
parameter in Sync SMELT worflow , so no updated / real data needed for rest of dashboard aviable
form logs:
[32;1m$ count=0 # collapsed multi-line command[0;m
++ count=0
++ ./qem-bot/bot-ng.py -c /etc/openqabot --token [MASKED] --debug --dry smelt-sync
++ tee bot_smelt-sync_0.log
INFO: Loaded 195 active incidents
and
'packages': ['sle-module-containers-release'],
'project': 'SUSE:Maintenance:23017',
'rr_number': 266265}]
INFO: Dry run, nothing synced
Gitlab job parameters fixed by removing --dry
from BOT_PARAMS
variable
Updated by okurz over 2 years ago
Awesome that you could fix it. I think we can still think of an improvement.
Updated by okurz over 2 years ago
- Status changed from Resolved to Feedback
So same as for other incidents with bigger impacts we should look for at least an improvement on top of the original problem resolution, see https://progress.opensuse.org/projects/qa/wiki/Tools#How-we-work-on-our-backlog . I recommend to conduct a "Five Why"-session. Also cleanup is needed so that we ensure all affected jobs are properly labeled, retriggered with correct parameters, etc.
Updated by osukup over 2 years ago
Probably biggest delay in identification of problem was --> nobody checked all related logs in gitlab
Updated by dzedro over 2 years ago
osukup wrote:
Probably biggest delay in identification of problem was --> nobody checked all related logs in gitlab
With nobody you mean you, jbaier or tools ?
Updated by osukup over 2 years ago
dzedro wrote:
osukup wrote:
Probably biggest delay in identification of problem was --> nobody checked all related logs in gitlab
With nobody you mean you, jbaier or tools ?
anybody with access to gitlab.suse.de :D I checked logs in 5 minutes of start my work and identified problem
Updated by livdywan over 2 years ago
- Copied to action #108944: 5 whys follow-up to Missing (re-)schedules of SLE maintenance tests size:M added
Updated by okurz over 2 years ago
osukup wrote:
dzedro wrote:
osukup wrote:
Probably biggest delay in identification of problem was --> nobody checked all related logs in gitlab
With nobody you mean you, jbaier or tools ?
anybody with access to gitlab.suse.de :D I checked logs in 5 minutes of start my work and identified problem
I agree. I am sure we benefit from teaching each other to help with resolving problems much more than finger-pointing :)
Updated by osukup over 2 years ago
- Status changed from Feedback to Resolved
5-Why's conducted 31.3 + followup actions coming