action #109974
openQA (public) - coordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release
QA (public) - coordination #109644: [epic] Future improvements for qem-bot
qem-bot/dashboard - mixed old and new incidents - potential future ideas
0%
Description
Observation¶
Maintenance sometimes re-uses old incidents instead of creating new ones for package which leads to mixed results in dashboard :(
see: https://suse.slack.com/archives/C02D16TCP99/p1648721562205869
So we need workaround/solution for this corner case
See also https://github.com/openSUSE/qem-dashboard/issues/61
Originally brought up by coolo in
https://suse.slack.com/archives/C02D16TCP99/p1638283633141300
I just noticed a rather alarming issue: http://dashboard.qam.suse.de/incident/20989 talks about 43 passed, 1 failed jobs for the incident
Problems¶
- http://dashboard.qam.suse.de/incident/20639 references "208 passed, 4 failed, 12 stopped" and a link to openQA results https://openqa.suse.de/tests/overview?build=%3A20639%3Aopensc but the openQA test results only show 183 passed and 18 soft-failed
- -> dashboard should not say "passed" when it means "passed+softfailed" but "ok", see https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Jobs/Constants.pm#L76=
- -> Consider using time-fixed links, e.g. https://openqa.suse.de/tests/overview?build=%3A20639%3Aopensc&t=2022-04-01+08%3A53%3A19+%2B0000
- -> Ensure that the results are current and correspond to what openQA sees itself (numbers should match)
- -> Exclude any results that are outside a "reasonable time range", e.g. http://dashboard.qam.suse.de/blocked for 20639 shows incident results from some months ago, build 2021…
Acceptance criteria¶
- AC1: It is possible to reuse incidents and qem-bot can still approve releated release requests
Suggestions¶
- Read the qem-dashboard schema to understand where important settings are stored in https://github.com/openSUSE/qem-dashboard/ , in particular https://github.com/openSUSE/qem-dashboard/blob/main/migrations/dashboard.sql
- Try to document a proper manual process as "Workaround" and for us to understand
- As first feature just delete all aggregate openQA data in qem-dashboard older than configurable, but default 90 days
- Optional: Add a manual gitlab CI pipeline trigger to be triggered manually
- The dashboard can trigger that cleanup when it gets new smelt data and notices an update of the RR (release request)
- We might need to identify "outdated openQA jobs" by "low openQA job id" or a timestamp. Might be necessary to add that to the qem-dashboard database
Workarounds¶
- Ask maintenance to create a new, fresh incident, e.g. by a comment in IBS
- Detect invalid requests e.g. with outdates results and reject them
- Manually delete
Something along the lines of
ssh root@qam2.suse.de
machinectl shell postgresql
sudo -u postgres psql dashboard_db
(wreak havok in here)
SELECT update_settings FROM openqa_jobs WHERE update_settings is not NULL AND timestap < NOW() - INTERVAL X
(store update_settings)
DELETE FROM openqa_jobs WHERE update_settings is not NULL AND timestap < NOW() - INTERVAL X
DELETE FROM update_openqa_settings WHERE id in `stored update_settings`
Updated by okurz over 2 years ago
- Copied from action #109310: qem-bot/dashboard - mixed old and new incidents size:M added
Updated by jbaier_cz 10 months ago
- Related to action #155206: [qem-bot] re-release update can miss repo and thus not schedule updates added