action #33127
closed[sle][s390x][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system
0%
Description
User Story¶
To avoid things happening like two weeks ago again, we want to have a permanent solution for running our s390x z/VM tests on a stable environment, e.g. the new storage and (later) on our new mainframe system (it's easily migrateable)
Acceptance Criteria¶
AC1: set up z/VM 6.3 (or even 6.4) on an LPAR running on our zEC12 Mainframe with disks on the DS8870
AC2: configure guests according to the requirements of QA SLE openQA automation tests on z/VM
AC3: set up guests according to additional requirements for manual testing on z/VM
AC4: integrate new system in our openQA environment
Updated by mgriessmeier over 6 years ago
most likely [epic] - to be discussed in sprint planning
Updated by okurz over 6 years ago
- Project changed from 46 to openQA Tests (public)
- Category set to Infrastructure
- Target version set to Milestone 15
Updated by okurz over 6 years ago
- Priority changed from Normal to High
It sounds like the old storage will die down soon - at least according to Wolfgang and Ihno so let's go ahead.
Updated by mgriessmeier over 6 years ago
- Subject changed from [sles][functional][s390x][infrastructure] set up dedicated z/VM for (open)QA on our new storage system to [sles][functional][s390x][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system
- Status changed from New to Workable
Updated by mgriessmeier over 6 years ago
- Status changed from Workable to Feedback
waiting for a LPAR from s390-Admin team to perform the installation there
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-03-27 to 2018-04-10
mgriessmeier wrote:
waiting for a LPAR from s390-Admin team to perform the installation there
apparently not happening this week anymore... shifting to next sprint
Updated by mgriessmeier over 6 years ago
- Subject changed from [sles][functional][s390x][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system to [sles][functional][s390x][u][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system
Updated by mgriessmeier over 6 years ago
no feedback from Ihno/Gerhard/Wolfgang until now
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-04-10 to 2018-04-24
Updated by okurz over 6 years ago
1 month ago you were explaining to mee that this is critical so do the other guys not share your concerns or what's going on? How high is the risk that SLE15 GMC validation will be impacted?
Updated by mgriessmeier over 6 years ago
okurz wrote:
1 month ago you were explaining to mee that this is critical so do the other guys not share your concerns or what's going on? How high is the risk that SLE15 GMC validation will be impacted?
That's how it sounded to me...
I was asking 3 times in the last month but no one had any new information.
I'd be fine with just dropping this and if it pops up, add it as fast track
Updated by okurz over 6 years ago
- Subject changed from [sles][functional][s390x][u][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system to [sle][functional][s390x][u][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system
- Due date deleted (
2018-04-24) - Target version changed from Milestone 15 to Milestone 17
ok, off we go into the danger zone !
Updated by mgriessmeier over 6 years ago
- Status changed from Feedback to Blocked
- Priority changed from High to Normal
- Target version changed from Milestone 17 to Milestone 21+
Blocked by limited resources... will be important when the new mainframe is set up
so moving to M21+ but keeping me assigned for tracking
lowering priority though
Updated by okurz over 6 years ago
- Target version changed from Milestone 21+ to Milestone 21+
Updated by coolo about 6 years ago
- Project changed from openQA Tests (public) to openQA Infrastructure (public)
- Category deleted (
Infrastructure)
Updated by okurz almost 6 years ago
- Status changed from Blocked to Workable
- Target version changed from Milestone 21+ to Milestone 24
new mainframe should be there.
Updated by okurz almost 6 years ago
@mgriessmeier how would you see the current state, still "Workable"? Assigned to you or unassign?
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 24 to Milestone 26
current state is that I didn't have time
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 26 to Milestone 27
actually in progress (blocked by infra atm) - though not urgent
Updated by mgriessmeier about 5 years ago
- Target version changed from Milestone 27 to Milestone 29
postponing
Updated by mgriessmeier almost 5 years ago
- Target version changed from Milestone 29 to Milestone 31
finishing this in january
Updated by maritawerner over 4 years ago
just out of curiosity: is that done now? I know that we will get new HW in April for Live Patching as well.
Updated by mgriessmeier over 4 years ago
nope, sorry, we still struggle to integrate it in the QA network.
but this is not related to Live patching HW
Updated by SLindoMansilla over 4 years ago
- Subject changed from [sle][functional][s390x][u][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system to [sle][s390x][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system
Updated by mgriessmeier over 4 years ago
- Status changed from Workable to In Progress
- Priority changed from Normal to Urgent
right now we face an issue with disks on the openQA z/VM workers.
luckily I am almost finished with the new setup.
Anyway, raising to urgent, cause all zVM tests are failing atm
Updated by mgriessmeier over 4 years ago
disabled hsi and ctc tests for Y-team and U-team:
https://gitlab.suse.de/qsf-y/qa-sle-functional-y/-/merge_requests/220
https://gitlab.suse.de/qsf-u/qa-sle-functional-userspace/-/merge_requests/95
replaced old with new workers (only 6 for now - more to come after testing)
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/246/diffs
Updated by mgriessmeier over 4 years ago
- Status changed from In Progress to Feedback
- Priority changed from Urgent to Normal
the 6 original workers were replaced and verified.
next steps: adding more resources
please let me know if any errors still occur
normalizing priority
Updated by okurz over 4 years ago
@mgriessmeier I think you removed all workers matching the still existing machine "s390x-zVM-hsi-l3" and a lot of tests scheduled against this machine are now piling up, e.g. see https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?orgId=1&fullscreen&panelId=4&from=now-7d&to=now and the schedule on https://openqa.suse.de/tests/ . Can you suggest how to handle these scenarios? Should s390x-zVM-hsi-l3 still exist or is it obsolete? If the first, can you readd according classes, if the latter, please crosscheck with test maintainers what the equivalent worker classes would be. Also I suggest to remove machine classes that have no matching worker class behind.
Updated by mgriessmeier over 4 years ago
okurz wrote:
@mgriessmeier I think you removed all workers matching the still existing machine "s390x-zVM-hsi-l3" and a lot of tests scheduled against this machine are now piling up, e.g. see https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?orgId=1&fullscreen&panelId=4&from=now-7d&to=now and the schedule on https://openqa.suse.de/tests/ . Can you suggest how to handle these scenarios? Should s390x-zVM-hsi-l3 still exist or is it obsolete? If the first, can you readd according classes, if the latter, please crosscheck with test maintainers what the equivalent worker classes would be. Also I suggest to remove machine classes that have no matching worker class behind.
Hi Olli,
so hsi and ctc worker classes are deprecated now (as I also mentioned in my mail to the openqa mailinglist.
I disabled jobs with those worker classes for the Yast and the Functional group and was mentioning that those should be done for others as well.
But I will crosscheck still running scenarios and ensure to disable them.
Updated by mgriessmeier over 4 years ago
- Priority changed from Urgent to High
ondrej sukup from QAM will check tomorrow and adapt their schedule - they have public holiday today
Updated by okurz over 4 years ago
Thanks, sound good. IMHO better this way that we crosscheck that no unintended test coverage drops :) We should re-enable the alert for "job schedule age" in https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?orgId=1&panelId=5&fullscreen&edit&tab=alert after we have ensured that there are no more jobs scheduled against machines with worker classes that do not exist anymore and also that according machine definitions do not even exist anymore.
Updated by okurz over 4 years ago
- Blocks action #68872: job age max exceeds alarm threshold added
Updated by okurz over 4 years ago
@mgriessmeier @osukup any progress? I currently do not see any stuck tests but machines like s390x-zVM-hsi-l3
still exist.
Updated by okurz over 4 years ago
@mgriessmeier you are still the assignee for this ticket but I assume you do not actually plan any further steps for it. Could you assign to osukup and ask for the remaining steps to be done or will you do?
Updated by mgriessmeier over 4 years ago
- Assignee changed from mgriessmeier to osukup
- Priority changed from High to Normal
Hi Ondrej,
could you please take care to remove remaining s390x-hsi and s390x-ctc leftovers.
I can't commit to that in the next few days.
thanks a lot
Updated by okurz about 4 years ago
Trying to remove the machine definitions from OSD:
s390x-zVM-ctc s390x INSTALLONLY=1
S390_NETWORK_PARAMS=CTCProtocol=0 InstNetDev=ctc HostIP=10.161.189.@S390_HOST@ Hostname=s390ctc@S390_HOST@.suse.de Gateway=10.161.189.254 Nameserver=10.160.0.1 Domain=suse.de ReadChannel=0.0.0600 WriteChannel=0.0.0601 Pointtopoint=10.161.189.254
WORKER_CLASS=s390x-zVM-ctc
yields
Bad Request: Group SLE 12 SP5 YaST must be updated through the YAML template
and trying to remove
s390x-zVM-hsi-l2 s390x INSTALLONLY=1
S390_NETWORK_PARAMS=OSAMedium=eth OSAInterface=qdio OSAHWAddr= InstNetDev=osa HostIP=10.161.183.@S390_HOST@/24 Hostname=s390hsl@S390_HOST@.suse.de Gateway=10.161.183.254 Nameserver=10.160.0.1 Domain=suse.de PortNo=0 Layer2=1 ReadChannel=0.0.7100 WriteChannel=0.0.7101 DataChannel=0.0.7102 Portname=trash
WORKAROUND_BUGS=bsc1156047
WORKER_CLASS=s390x-zVM-hsi-l2
s390x-zVM-hsi-l3 s390x INSTALLONLY=1
S390_NETWORK_PARAMS=Portname=trash InstNetDev=osa OSAInterface=qdio OSAMedium=eth HostIP=10.161.185.@S390_HOST@/24 Hostname=s390hsi@S390_HOST@.suse.de Gateway=10.161.185.254 Nameserver=10.160.0.1 Domain=suse.de PortNo=0 Layer2=0 ReadChannel=0.0.7000 WriteChannel=0.0.7001 DataChannel=0.0.7002
WORKER_CLASS=s390x-zVM-hsi-l3
yields:
Bad Request: Group SLE 12 SP5 YaST must be updated through the YAML templateBad Request: Groups Maintenance: SLE 12 SP1 Incidents, SLE 12 SP5 YaST must be updated through the YAML template
Unfortunately there was no update from osukup (also no answer to chat messages).
EDIT: Managed to remove the ctc reference after deleting the last job template from https://openqa.suse.de/admin/job_templates/142 . The file says it is managed in git but actually the SLE12 files have been removed from https://gitlab.suse.de/qsf-y/qa-sle-functional-y in a commit which is just stating the obvious, not explaining why. I assume someone wanted to cleanup.
Also removed
- qam-minimal:
machine: s390x-zVM-hsi-l3
settings:
PATTERNS: 'minimal'
from https://openqa.suse.de/admin/job_templates/41 "Maintenance: SLE 12 SP1 Incidents" so I could delete both machines as well.
Updated by okurz about 4 years ago
- Status changed from Feedback to Resolved
- Assignee changed from osukup to okurz
As mgriessmeier assigned to osukup for the last cleanup task which I did now I guess we can call this ticket Resolved completely