Project

General

Profile

action #121594

Extend OSD storage space for "results" to make bug investigation and failure archeology easier - 2022

Added by okurz 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

More and more products are tested and more and more tests are running on OSD which increases the results we need to store per time period. This in turn means that we need to restrict the time duration for which we can save results. This makes it harder to investigate product bugs which have been reported based on openQA test results as well as test failures. #121588 showed that storing openQA test results would have helped handling pending SLE maintenance updates. Additionally ALP development bringing in new requirements including storing results.

Acceptance criteria

  • AC1: Significant increase of storage space for "results" on OSD
  • AC2: Job group result retention periods have been increased to make efficient use of the available storage space
  • AC3: "results" on OSD has still enough headroom, e.g. only used up to 80-85%

Suggestions

  • Create EngInfra ticket with business justification, similar as in #77890, and point out that the last request for increase was in 2021-03 and only +0.5TB was done.

Related issues

Copied from openQA Infrastructure - action #77890: [easy] Extend OSD storage space for "results" to make bug investigation and failure archeology easierResolved2020-11-14

History

#1 Updated by okurz 2 months ago

  • Copied from action #77890: [easy] Extend OSD storage space for "results" to make bug investigation and failure archeology easier added

#2 Updated by okurz 2 months ago

  • Status changed from New to Blocked
  • Assignee set to okurz

#3 Updated by okurz about 2 months ago

SD ticket was "resolved" but I don't see the increased space yet. Added a message in SD:

I don’t see the change yet. I wonder about the disk path you mentioned. Are you sure we are talking about “openqa.suse.de” https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=9456 on morla-cluster as you mentioned a path with “atreju”? There is ariel-opensuse.suse.de aka. openqa.opensuse.org https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=3158 which runs on atreju. We don’t mind the extra space there but it does not help us for the request on OSD though :)

#4 Updated by okurz about 2 months ago

  • Status changed from Blocked to Resolved

There was some confusion but in the end bmwiedemann and me concluded the work for https://sd.suse.com/servicedesk/customer/portal/1/SD-106581 and the disk was resized to 7.0TB. I did xfs_grows /dev/vdd and we have more space.

In the OSD database I did:

openqa=> select name,keep_results_in_days from job_groups where name ~ '^Maintenance: ' and not name ~'before reorg' order by keep_results_in_days;
                          name                           | keep_results_in_days 
---------------------------------------------------------+----------------------
 Maintenance: SLE 15 SP2 Updates                         |                   12
 Maintenance: Leap 15.3 Incidents                        |                   12
 Maintenance: Leap 15.4 Incidents                        |                   12
 Maintenance: SLE 12 SP1 Updates                         |                   20
 Maintenance: SLE 15 GA Incidents                        |                   20
 Maintenance: KOTD - 12 SP4                              |                   21
 Maintenance: KOTD - 12 SP3                              |                   21
 Maintenance: KOTD - 15 SP2                              |                   21
 Maintenance: KOTD - 15 SP1                              |                   21
 Maintenance: KOTD - 15 GA                               |                   21
 Maintenance: SLE 12 GA Updates                          |                   30
 Maintenance: Kiwi 12-SP4                                |                   30
 Maintenance: SLE 12 SP3 Incidents                       |                   30
 Maintenance: Kiwi 15GA                                  |                   30
 Maintenance: CaaSP 2.0 Incidents                        |                   30
 Maintenance: Kiwi 15-SP1                                |                   30
 Maintenance: Kiwi 12-SP3                                |                   30
 Maintenance: SLE 15 SP3 Incidents                       |                   36
 Maintenance: SLE 15 SP2 Incidents                       |                   40
 Maintenance: SLE 15 SP1 Incidents                       |                   40
 Maintenance: SLE 15 GA HA Incidents                     |                   40
 Maintenance: SLE 12 SP4 Incidents                       |                   40
 Maintenance: SLE 12 SP2 Incidents                       |                   40
 Maintenance: SLE 15 SP2 HPC Incidents                   |                   60
 Maintenance: SLE 15 GA HPC Incidents                    |                   60
 Maintenance: SLE 15 SP1 HPC Incidents                   |                   60
 Maintenance: SLE 15 SP3 HPC Incidents                   |                   60
 Maintenance: Kiwi 15-SP2                                |                   70
 Maintenance: SLE 12 SP4 SAP Incidents                   |                   70
 Maintenance: SLE 15 SP3 SAP Incidents                   |                   70
 Maintenance: SLE 12 SP5 Incidents                       |                   70
 Maintenance: KOTD - 12 SP5                              |                   70
 Maintenance: SLE 15 SP4 Incidents                       |                   70
 Maintenance: SLE 12 SP5 HA Incidents                    |                   70
 Maintenance: SLE 15 SP4 SAP Incidents                   |                   70
 Maintenance: SLE 15 SP4 HA Incidents                    |                   70
 Maintenance: SLE 12 SP5 SAP Incidents                   |                   70
 Maintenance: SLE 15 SP3 HA Incidents                    |                   70
 Maintenance: JeOS 15 SP2 Incidents                      |                   70
 Maintenance: Kiwi 12-SP5                                |                   70
 Maintenance: SLE 15 SP2 HA Incidents                    |                   70
 Maintenance: Public Cloud 15-SP2 Incidents [deprecated] |                   70
 Maintenance: SLE 15 SP4 HPC Incidents                   |                   70
 Maintenance: SLE 15 SP2 SAP Incidents                   |                   70
 Maintenance: SLE 12 SP5 HPC Incidents                   |                   70
 Maintenance: Public Cloud 15-SP3 Incidents [deprecated] |                   70
 Maintenance: 15-SP2 Staging Images                      |                   70
 Maintenance: SLE 15 SP1 HA Incidents                    |                   70
 Maintenance: SLE 12 SP2 SAP Incidents                   |                   70
 Maintenance: SLE 15 GA SAP Incidents                    |                   70
 Maintenance: SLE 15 SP1 SAP Incidents                   |                   70
 Maintenance: SLE 12 SP3 SES 5                           |                   90
 Maintenance: SLE 12 SP1 Kernel Incidents                |                   90
 Maintenance: SLE 12 SP3 Kernel Incidents                |                   90
 Maintenance: SLE 12 SP3 HA Incidents                    |                   90
 Maintenance: SLE 12 GA Incidents                        |                   90
 Maintenance: SLE 12-SP4 Update Install                  |                   90
 Maintenance: SLE 12 SP4 HA Incidents                    |                   90
 Maintenance: CaaSP 3.0 Incidents                        |                   90
 Maintenance: SLE 12 SP5 Kernel Incidents                |                   90
 Maintenance: 12-SP3-TERADATA Kernel Incidents           |                   90
 Maintenance: SLE 15 SP3 Kernel Incidents                |                   90
 Maintenance: SLE 12 GA Kernel Incidents                 |                   90
 Maintenance: SLE 12 SP3 SAP Incidents                   |                   90
 Maintenance: SLE 15 SP2 Kernel Incidents                |                   90
 Maintenance: SLE 12 GA Update Install                   |                   90
 Maintenance: SLE 12 SP2 HPC Incidents                   |                   90
 Maintenance: SLE 15 SP1 Kernel Incidents                |                   90
 Maintenance: 12-SP3-TERADATA Incidents                  |                   90
 Maintenance: SLE 12 SP2 Kernel Incidents                |                   90
 Maintenance: SLE 12 SP3 HPC Incidents                   |                   90
 Maintenance: SLE 12 SP4 Kernel Incidents                |                   90
 Maintenance: SLE 15 SP4 Kernel Incidents                |                   90
 Maintenance: SLE 15 GA Kernel Incidents                 |                   90
 Maintenance: SLE 12 SP2 HA Incidents                    |                   90
 Maintenance: SLE 12 SP1 Incidents                       |                   90
 Maintenance: SLE 11 SP1 Kernel Incidents                |                   90
 Maintenance: SLE 12 SP4 HPC Incidents                   |                   90
 Maintenance: SLE 11 SP3 Kernel Incidents                |                  270
 Maintenance: Public Cloud Incidents                     |                  365
 Maintenance: Virtualization - VMware + Hyper-V          |                 1095
 Maintenance: SLE 12-SP1 Update Install                  |                     
 Maintenance: SLE 15 GA Update Install                   |                     
 Maintenance: Public Cloud 15-SP3 HA Updates             |                     
 Maintenance: SLE 12-SP3 Update Install                  |                     
 Maintenance: SLE 12-SP2 Update Install                  |                     

and then executed

update job_groups set keep_results_in_days = 60 where name ~ '^Maintenance: ' and not name ~'before reorg' and keep_results_in_days < 60;

with result "UPDATE 23".

Also available in: Atom PDF