action #164988
opencoordination #92323: [saga][epic] Scale up: Fine-grained control over use and removal of results, assets, test data
Better accounting for openqa-investigation jobs size:S
0%
Description
Motivation¶
#164979 alerted us about /results being nearly full. We found that groupless jobs are now the biggest offender linked to heavy jobs failing often triggering also heavy openqa-investigate jobs.
Acceptance criteria¶
- AC1: Big investigation jobs will not fill up our disk space; we would instead just keep less of them.
Suggestions¶
- Count investigation jobs towards the group of the original job
- Investigation jobs are groupless to avoid being considered for the result of the according group
- It is probably also not wanted by users; investigation jobs should not cause normal jobs to be stored less long but still be kept for a short time.
- The way the cleanup algorithm currently works makes this also hard to implement. It goes though jobs group by group and factoring in groupless jobs here without good relations in the database is not straight forward / efficient.
- Use a dedicated group for all investigation jobs
- Sounds most promising - just create a new group and schedule investigation jobs to be part of it.
- There is a caveat: Having all investigation jobs in one group does not solve the problem that investigation jobs for a particular scenario become very big. If we put everything in one group one scenario might cause other investigation jobs to be stored only very shortly.
Updated by okurz 8 months ago
- Copied from action #164979: [alert][grafana] File systems alert for WebUI /results size:S added
Updated by okurz about 2 months ago
- Target version changed from Tools - Next to Ready
Updated by ybonatakis 13 days ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by ybonatakis 13 days ago ยท Edited
A new Investigations
group is created under Others
for both OSD[0] and O3[1].
The settings keeps the defaults but they are a bit different between each instance.
For instance OSD Keep results for
is 21 days as opposed to O3 which is 40
[0] https://openqa.suse.de/group_overview/637
[1] https://openqa.opensuse.org/group_overview/132
Updated by ybonatakis 12 days ago
https://github.com/os-autoinst/scripts/pull/381
struggling with the test. submitted only the change in the investigation script
Updated by ybonatakis 12 days ago
- Status changed from In Progress to Feedback
ybonatakis wrote in #note-11:
https://github.com/os-autoinst/scripts/pull/381
struggling with the test. submitted only the change in the investigation script
I did add a test but I decided to go with what works as I couldnt make the new test case to work modifying the host explicitly. I guess there is not a real request to O3, but it would be nice to understand why test breaking in different ways with my other attempts to inject host.
Updated by ybonatakis 5 days ago
I guess this is still open due to https://github.com/os-autoinst/scripts/pull/381#discussion_r2007995030
Updated by ybonatakis 1 day ago
- Status changed from Workable to Feedback
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1424.
But I am not sure where to go for O3. is it something we set manual?
Updated by tinita about 23 hours ago
Yet another idea: How about a new "Investigation" parent group, which can have sub groups per original group. If there is a investigation subgroup defined for a group, investigation jobs go there, otherwise as a fallback they go into the main Investigation (sub) group.
E.g. OpenQA investigation jobs would go into "Investigation - openQA", "Development / Agama Devel" would go into "Investigation - Development - Agama Devel". Others go into "Investigation / Misc".
And then such individual investigation subgroups can be configured to keep results/logs for a shorter time.
This way we don't have to define an extra group for every group, just for the big ones.
I can't see another way of doing this automatically if we have such different cases where some groups create a lot of investigation jobs and others don't.
Updated by ybonatakis about 12 hours ago
tinita wrote in #note-16:
Yet another idea: How about a new "Investigation" parent group, which can have sub groups per original group. If there is a investigation subgroup defined for a group, investigation jobs go there, otherwise as a fallback they go into the main Investigation (sub) group.
E.g. OpenQA investigation jobs would go into "Investigation - openQA", "Development / Agama Devel" would go into "Investigation - Development - Agama Devel". Others go into "Investigation / Misc".
And then such individual investigation subgroups can be configured to keep results/logs for a shorter time.
This way we don't have to define an extra group for every group, just for the big ones.I can't see another way of doing this automatically if we have such different cases where some groups create a lot of investigation jobs and others don't.
I kinda liked the idea in the first read. But may not need to go that far. The main problem in concern is to not have investigation job consuming disk space with logs and results, right?
If we have different groups we have to keep track of their settings. I would say it is unnecessary and it doesnt give us much benefits at the end.