Project

General

Profile

Actions

action #129412

closed

Verify cleanup behavior of groupless job results

Added by mkittler over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2023-05-16
Due date:
2023-05-31
% Done:

0%

Estimated time:

Description

Motivation

Judging by https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&viewPanel=19 the "group" of jobs with the biggest result size are jobs not in any group. We had a brief look at the retention behavior of job results of groupless jobs but were not sure about it.

Acceptance criteria

  • AC1: We know whether and how the cleanup of job results of groupless jobs works.
  • AC2: The cleanup behavior¹ is configurable (and not using some hard-coded time intervals).
  • AC3: There is a unit test ensuring the cleanup behavior¹ actually works as intended.
  • AC4: Documentation about the cleanup behavior¹ and is in accordance with what is actually happening.

¹ The "cleanup behavior" of job results of groupless jobs specifically.

Suggestions

  • Have a look at ib/OpenQA/Task/Job/Limit.pm (the code under "create temporary job group outside of DB to collect") and lib/OpenQA/Schema/Result/JobGroups.pm.
  • Remove the mentioned code and check whether tests fail. No tests fail after the removal. We need to add a new test.

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #129244: [alert][grafana] File systems alert for WebUI /results size:MResolvedmkittler2023-05-122023-05-30

Actions
Actions #1

Updated by okurz over 1 year ago

  • Related to action #129244: [alert][grafana] File systems alert for WebUI /results size:M added
Actions #2

Updated by okurz over 1 year ago

  • Target version set to Ready
Actions #3

Updated by mkittler over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to mkittler
Actions #4

Updated by mkittler over 1 year ago

  • Description updated (diff)
Actions #5

Updated by mkittler over 1 year ago

Before extending the documentation to cover AC4 it makes sense to generally restructure the cleanup-related documentation a bit: https://github.com/os-autoinst/openQA/pull/5139

Actions #6

Updated by openqa_review over 1 year ago

  • Due date set to 2023-05-31

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by mkittler over 1 year ago

The cleanup of groupless jobs basically works and can also be configured as https://github.com/os-autoinst/openQA/pull/5142 shows.

That means on OSD the following retention is configured for groupless jobs:

[default_group_limits]
asset_size_limit = 5
log_storage_duration = 10
important_log_storage_duration = 90
result_storage_duration = 21
important_result_storage_duration = 0

This config is shared with job groups that have no limits configured otherwise¹. So if we wanted to change it independently of that we needed to implement that possibility first in openQA.


¹ According to select id, name from job_groups where keep_logs_in_days is null or keep_important_logs_in_days is null or keep_results_in_days is null or keep_important_results_in_days is null order by id asc; that are 70 groups at this point.

Actions #8

Updated by mkittler over 1 year ago

So the question is: Do we want to reduce the retention of groupless jobs? It would make sense considering it is the largest "group" of jobs.

If the answer is yes then the next question is: Do we want to configure this without affecting 70 groups? If the answer is yes I could either implement an openQA feature to be able to configure the retention of groupless jobs independently or I could just set the retention of those groups explicitly.

Actions #9

Updated by mkittler over 1 year ago

  • Status changed from In Progress to Feedback
Actions #10

Updated by okurz over 1 year ago

mkittler wrote:

So the question is: Do we want to reduce the retention of groupless jobs? It would make sense considering it is the largest "group" of jobs.

Yes

If the answer is yes then the next question is: Do we want to configure this without affecting 70 groups? If the answer is yes I could either implement an openQA feature to be able to configure the retention of groupless jobs independently or I could just set the retention of those groups explicitly.

Yes, make it independent. I assume that's basically creating separate config settings, nothing else

Actions #11

Updated by mkittler over 1 year ago

  • Status changed from Feedback to In Progress
Actions #12

Updated by mkittler over 1 year ago

  • Status changed from In Progress to Feedback

Yes, make it independent.

Good, I've been adding a further commit to my existing PR to implement that: https://github.com/os-autoinst/openQA/pull/5142/commits/d9bb87ec82dcef6e1e77e390a42ade2e22a11c36

I assume that's basically creating separate config settings, nothing else

And yes, it is mainly just introducing a distinct set of config settings. Otherwise there was only one condition required to be added to the code ($self->in_storage, to decide which set of settings/limits to use).

Actions #13

Updated by livdywan over 1 year ago

One thing I only realized now is that there's no way to see limits for groupless jobs in the web UI. For a job in a group one can check how the group is configured or if it's using defaults.
Maybe we could add a note like "This job is not part of a group and ..." within a popover?

Actions #14

Updated by mkittler over 1 year ago

  • Status changed from Feedback to Resolved

Where would we add this popover? There's no relevant group page for groupless jobs and I don't think it makes sense to mention the cleanup-specific settings on the test details page. Besides, I don't think we should bother users with popovers explaining a configuration only the openQA admin can do anyways.

What we could do in general is showing how long a job has yet to live (regardless whether it is a grouped job or not). However, I wouldn't do this as part of this ticket.

The PR has been merged so I'm resolving the ticket. I think mentioning this in the documentation as it is done by my PR is good enough.

Actions

Also available in: Atom PDF