action #127523
closed[qe-core][s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources
0%
Description
Motivation¶
In #125993 a generic "s390-kvm" worker class was introduced to prevent too long waiting for s390x worker ressources. Some problems were found failing jobs for yet unknown reason hence qam reverted to using s390-kvm-sle12 for the time being. The job failures should be investigated and problems should be fixed to be able to use "s390-kvm" worker class wherever you don't care about the specific s390x hypervisor OS version.
Acceptance criteria¶
- AC1: Most s390x kvm tests use an openQA "machine definition" that uses the generic "s390-kvm" worker class
- AC2: Current OSD openQA s390-kvm workers all have the generic class "s390-kvm"
Suggestions¶
- Understand and fix test failures mentioned in #125993 or #127337
- Verify common openQA tests can work on "s390-kvm" worker class
- Use "s390-kvm" in production job templates
Files
Updated by JERiveraMoya over 1 year ago
- Status changed from New to Workable
- Priority changed from Normal to Low
- Target version set to Current
- Parent task set to #130072
Updated by okurz over 1 year ago
- Related to action #135578: Long job age and jobs not executed for long size:M added
Updated by okurz about 1 year ago
- Description updated (diff)
Coming back to this as I currently (again) see a longer job queue for s390-kvm jobs. https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/518 removed some machines from the generic s390x kvm worker class but there was no action yet to bring back those machines so they are underused while in other cases there is a long job queue for s390-kvm openQA jobs. I am suggesting to ensure that all current OSD openQA s390-kvm workers all have the generic class "s390-kvm".
Please reconsider the priority for this one given the impact on blocked SLE maintenance updates.
Updated by okurz about 1 year ago
- Related to action #135329: s390x work demand exceeds available workers added
Updated by okurz about 1 year ago
- Copied to action #137012: [s390x][kvm][security] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources - security squad added
Updated by szarate about 1 year ago
- Tags set to qe-core-september-sprint
- Project changed from qe-yam to openQA Infrastructure (public)
- Priority changed from Low to High
- Target version changed from Current to QE-Core: Ready
Updated by okurz about 1 year ago
- Related to action #127337: Some s390x workers have been failing for all jobs since 11 months ago added
Updated by szarate about 1 year ago
- Related to action #137255: [s390x][kvm][qe-core] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker resources - core squad added
Updated by szarate about 1 year ago
- Subject changed from [s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources to [qe-core][s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources
- Status changed from Workable to Feedback
- Assignee set to mgrifalconi
Michael to check on single incidents
Updated by szarate about 1 year ago
- Sprint set to QE-Core: October Sprint 23 (Oct 11 - Nov 08)
Updated by mgrifalconi about 1 year ago
Confirmed that for QE-Core perspective all job groups were changed: https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/commit/775d5dbe5fee5f68566e85969fdcd467cba54dd1 Not sure if we own some other job group that I don't recall though @szarate
Updated by okurz about 1 year ago
there are still jobs for non-existant s390x-kvm-sle12 stuck in the schedule, e.g. https://openqa.suse.de/tests/12635826, it's in group "Functional" so certainly your domain
Updated by mgrifalconi about 1 year ago
Thanks Oliver! I missed that! https://gitlab.suse.de/qe-core/qa-sle-functional-userspace/-/merge_requests/182
Updated by okurz about 1 year ago
https://openqa.suse.de/admin/machines still mentions 5 machines which reference variants of "s390-kvm-sle12". I suggest you ensure that all those machine definitions are gone meaning that first all referencing job templates must be adjusted accordingly and then the machines removed.
Updated by szarate about 1 year ago
Seems like there are still remanents here, so we cannot change/remove the old machine settings
Bad Request: Groups BCI [deprecated], Containers: Development, HA Development, Maintenance - QR - SLE15GA, Maintenance - QR - SLE15SP1, Maintenance - QR - SLE15SP2, Maintenance - QR - SLE15SP3, Maintenance - QR - SLE15SP3-SAP, Maintenance - QR - SLE15SP4, Maintenance - QR - SLE15SP4-SAP, Maintenance - QR - SLE15SP5, Maintenance - QR - SLE15SP5-SAP, Maintenance: SLE 12 SP1 Kernel Incidents, Maintenance: SLE 12 SP2 Kernel Incidents, Maintenance: SLE 12 SP3 Kernel Incidents, Migration : SLE15GA Milestone, Migration: HA, Migration: SLE15GA, Migration:Continuous Upgrade SLE12SP5, SLE 12 Kernel, SLE 12 Migration: SLES, SLE 12 Migration:Milestone SLES, SLE 12 SP5 Functional: Server, SLE 12 SP5 YaST, SLE 12 Security, SLE 15 (Development) - Userspace, SLE Micro: Development, Security-QR-Staging, Test Development: SLE 12, Test QAM HA-SAP, Test development: SLE15 lemon, Virtualization must be updated through the YAML template
Updated by szarate about 1 year ago
- Tags changed from qe-core-october-sprint to qe-core-october-sprint, qe-core-november-sprint
Updated by okurz about 1 year ago
- Copied to action #151331: [qe-core][s390x][zvm] Make use of generic "s390x-zVM" class instead of s390x-zVM-vswitch-l2+l3+etc. added
Updated by szarate about 1 year ago
- Sprint changed from QE-Core: October Sprint 23 (Oct 11 - Nov 08) to QE-Core: November Sprint 23 (Nov 15 - Dec 13)
Updated by mgrifalconi about 1 year ago
All 3 machine types are not used anymore and were renamed.
Should I delete them now?
Updated by mgrifalconi about 1 year ago
- Status changed from Feedback to Resolved
Machines types deleted. I would say we can close this topic!