action #129412
closedVerify cleanup behavior of groupless job results
Description
Motivation¶
Judging by https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&viewPanel=19 the "group" of jobs with the biggest result size are jobs not in any group. We had a brief look at the retention behavior of job results of groupless jobs but were not sure about it.
Acceptance criteria¶
- AC1: We know whether and how the cleanup of job results of groupless jobs works.
- AC2: The cleanup behavior¹ is configurable (and not using some hard-coded time intervals).
- AC3: There is a unit test ensuring the cleanup behavior¹ actually works as intended.
- AC4: Documentation about the cleanup behavior¹ and is in accordance with what is actually happening.
¹ The "cleanup behavior" of job results of groupless jobs specifically.
Suggestions¶
- Have a look at
ib/OpenQA/Task/Job/Limit.pm
(the code under "create temporary job group outside of DB to collect") andlib/OpenQA/Schema/Result/JobGroups.pm
. Remove the mentioned code and check whether tests fail.No tests fail after the removal. We need to add a new test.
Updated by okurz over 1 year ago
- Related to action #129244: [alert][grafana] File systems alert for WebUI /results size:M added
Updated by mkittler over 1 year ago
- Status changed from New to In Progress
- Assignee set to mkittler
Updated by mkittler over 1 year ago
Before extending the documentation to cover AC4 it makes sense to generally restructure the cleanup-related documentation a bit: https://github.com/os-autoinst/openQA/pull/5139
Updated by openqa_review over 1 year ago
- Due date set to 2023-05-31
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 1 year ago
The cleanup of groupless jobs basically works and can also be configured as https://github.com/os-autoinst/openQA/pull/5142 shows.
That means on OSD the following retention is configured for groupless jobs:
[default_group_limits]
asset_size_limit = 5
log_storage_duration = 10
important_log_storage_duration = 90
result_storage_duration = 21
important_result_storage_duration = 0
This config is shared with job groups that have no limits configured otherwise¹. So if we wanted to change it independently of that we needed to implement that possibility first in openQA.
¹ According to select id, name from job_groups where keep_logs_in_days is null or keep_important_logs_in_days is null or keep_results_in_days is null or keep_important_results_in_days is null order by id asc;
that are 70 groups at this point.
Updated by mkittler over 1 year ago
So the question is: Do we want to reduce the retention of groupless jobs? It would make sense considering it is the largest "group" of jobs.
If the answer is yes then the next question is: Do we want to configure this without affecting 70 groups? If the answer is yes I could either implement an openQA feature to be able to configure the retention of groupless jobs independently or I could just set the retention of those groups explicitly.
Updated by okurz over 1 year ago
mkittler wrote:
So the question is: Do we want to reduce the retention of groupless jobs? It would make sense considering it is the largest "group" of jobs.
Yes
If the answer is yes then the next question is: Do we want to configure this without affecting 70 groups? If the answer is yes I could either implement an openQA feature to be able to configure the retention of groupless jobs independently or I could just set the retention of those groups explicitly.
Yes, make it independent. I assume that's basically creating separate config settings, nothing else
Updated by mkittler over 1 year ago
- Status changed from Feedback to In Progress
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
Yes, make it independent.
Good, I've been adding a further commit to my existing PR to implement that: https://github.com/os-autoinst/openQA/pull/5142/commits/d9bb87ec82dcef6e1e77e390a42ade2e22a11c36
I assume that's basically creating separate config settings, nothing else
And yes, it is mainly just introducing a distinct set of config settings. Otherwise there was only one condition required to be added to the code ($self->in_storage
, to decide which set of settings/limits to use).
Updated by livdywan over 1 year ago
One thing I only realized now is that there's no way to see limits for groupless jobs in the web UI. For a job in a group one can check how the group is configured or if it's using defaults.
Maybe we could add a note like "This job is not part of a group and ..." within a popover?
Updated by mkittler over 1 year ago
- Status changed from Feedback to Resolved
Where would we add this popover? There's no relevant group page for groupless jobs and I don't think it makes sense to mention the cleanup-specific settings on the test details page. Besides, I don't think we should bother users with popovers explaining a configuration only the openQA admin can do anyways.
What we could do in general is showing how long a job has yet to live (regardless whether it is a grouped job or not). However, I wouldn't do this as part of this ticket.
The PR has been merged so I'm resolving the ticket. I think mentioning this in the documentation as it is done by my PR is good enough.