Project

General

Profile

Actions

action #127523

closed

[qe-core][s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources

Added by okurz about 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Sprint:
QE-Core: November Sprint 23 (Nov 15 - Dec 13)

Description

Motivation

In #125993 a generic "s390-kvm" worker class was introduced to prevent too long waiting for s390x worker ressources. Some problems were found failing jobs for yet unknown reason hence qam reverted to using s390-kvm-sle12 for the time being. The job failures should be investigated and problems should be fixed to be able to use "s390-kvm" worker class wherever you don't care about the specific s390x hypervisor OS version.

Acceptance criteria

  • AC1: Most s390x kvm tests use an openQA "machine definition" that uses the generic "s390-kvm" worker class
  • AC2: Current OSD openQA s390-kvm workers all have the generic class "s390-kvm"

Suggestions

  • Understand and fix test failures mentioned in #125993 or #127337
  • Verify common openQA tests can work on "s390-kvm" worker class
  • Use "s390-kvm" in production job templates

Files


Related issues 6 (0 open6 closed)

Related to openQA Infrastructure - action #135578: Long job age and jobs not executed for long size:MResolvednicksinger

Actions
Related to openQA Infrastructure - action #135329: s390x work demand exceeds available workersResolvedokurz2023-09-07

Actions
Related to openQA Infrastructure - action #127337: Some s390x workers have been failing for all jobs since 11 months agoResolvedokurz2023-04-06

Actions
Related to openQA Tests - action #137255: [s390x][kvm][qe-core] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker resources - core squadResolvedmgrifalconi2023-09-29

Actions
Copied to qe-yam - action #137012: [s390x][kvm][security] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources - security squadResolvedpstivanin

Actions
Copied to openQA Infrastructure - action #151331: [qe-core][s390x][zvm] Make use of generic "s390x-zVM" class instead of s390x-zVM-vswitch-l2+l3+etc.Resolvedmgrifalconi

Actions
Actions #2

Updated by JERiveraMoya 11 months ago

  • Status changed from New to Workable
  • Priority changed from Normal to Low
  • Target version set to Current
  • Parent task set to #130072
Actions #3

Updated by okurz 7 months ago

  • Related to action #135578: Long job age and jobs not executed for long size:M added
Actions #4

Updated by okurz 7 months ago

  • Description updated (diff)

Coming back to this as I currently (again) see a longer job queue for s390-kvm jobs. https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/518 removed some machines from the generic s390x kvm worker class but there was no action yet to bring back those machines so they are underused while in other cases there is a long job queue for s390-kvm openQA jobs. I am suggesting to ensure that all current OSD openQA s390-kvm workers all have the generic class "s390-kvm".

Please reconsider the priority for this one given the impact on blocked SLE maintenance updates.

Actions #5

Updated by okurz 7 months ago

  • Related to action #135329: s390x work demand exceeds available workers added
Actions #6

Updated by okurz 7 months ago

  • Copied to action #137012: [s390x][kvm][security] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources - security squad added
Actions #7

Updated by szarate 7 months ago

  • Tags set to qe-core-september-sprint
  • Project changed from qe-yam to openQA Infrastructure
  • Priority changed from Low to High
  • Target version changed from Current to QE-Core: Ready
Actions #8

Updated by JERiveraMoya 7 months ago

  • Parent task deleted (#130072)
Actions #9

Updated by okurz 7 months ago

  • Related to action #127337: Some s390x workers have been failing for all jobs since 11 months ago added
Actions #10

Updated by szarate 7 months ago

  • Related to action #137255: [s390x][kvm][qe-core] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker resources - core squad added
Actions #11

Updated by szarate 7 months ago

  • Subject changed from [s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources to [qe-core][s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources
  • Status changed from Workable to Feedback
  • Assignee set to mgrifalconi

Michael to check on single incidents

Actions #12

Updated by szarate 7 months ago

  • Sprint set to QE-Core: October Sprint 23 (Oct 11 - Nov 08)
Actions #13

Updated by mgrifalconi 7 months ago

Confirmed that for QE-Core perspective all job groups were changed: https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/commit/775d5dbe5fee5f68566e85969fdcd467cba54dd1 Not sure if we own some other job group that I don't recall though @szarate

Actions #14

Updated by okurz 6 months ago

there are still jobs for non-existant s390x-kvm-sle12 stuck in the schedule, e.g. https://openqa.suse.de/tests/12635826, it's in group "Functional" so certainly your domain

Actions #16

Updated by okurz 6 months ago

https://openqa.suse.de/admin/machines still mentions 5 machines which reference variants of "s390-kvm-sle12". I suggest you ensure that all those machine definitions are gone meaning that first all referencing job templates must be adjusted accordingly and then the machines removed.

Actions #17

Updated by szarate 5 months ago

Seems like there are still remanents here, so we cannot change/remove the old machine settings

Bad Request: Groups BCI [deprecated], Containers: Development, HA Development, Maintenance - QR - SLE15GA, Maintenance - QR - SLE15SP1, Maintenance - QR - SLE15SP2, Maintenance - QR - SLE15SP3, Maintenance - QR - SLE15SP3-SAP, Maintenance - QR - SLE15SP4, Maintenance - QR - SLE15SP4-SAP, Maintenance - QR - SLE15SP5, Maintenance - QR - SLE15SP5-SAP, Maintenance: SLE 12 SP1 Kernel Incidents, Maintenance: SLE 12 SP2 Kernel Incidents, Maintenance: SLE 12 SP3 Kernel Incidents, Migration : SLE15GA Milestone, Migration: HA, Migration: SLE15GA, Migration:Continuous Upgrade SLE12SP5, SLE 12 Kernel, SLE 12 Migration: SLES, SLE 12 Migration:Milestone SLES, SLE 12 SP5 Functional: Server, SLE 12 SP5 YaST, SLE 12 Security, SLE 15 (Development) - Userspace, SLE Micro: Development, Security-QR-Staging, Test Development: SLE 12, Test QAM HA-SAP, Test development: SLE15 lemon, Virtualization must be updated through the YAML template

Actions #18

Updated by szarate 5 months ago

  • Tags changed from qe-core-october-sprint to qe-core-october-sprint, qe-core-november-sprint
Actions #19

Updated by okurz 5 months ago

  • Copied to action #151331: [qe-core][s390x][zvm] Make use of generic "s390x-zVM" class instead of s390x-zVM-vswitch-l2+l3+etc. added
Actions #20

Updated by szarate 5 months ago

  • Sprint changed from QE-Core: October Sprint 23 (Oct 11 - Nov 08) to QE-Core: November Sprint 23 (Nov 15 - Dec 13)
Actions #21

Updated by mgrifalconi 5 months ago

All 3 machine types are not used anymore and were renamed.
Should I delete them now?

Actions #22

Updated by mgrifalconi 5 months ago

  • Status changed from Feedback to Resolved

Machines types deleted. I would say we can close this topic!

Actions

Also available in: Atom PDF