Project

General

Profile

action #33127

[sle][s390x][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system

Added by mgriessmeier almost 4 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
SUSE QA - Milestone 31
Start date:
2018-03-13
Due date:
% Done:

0%

Estimated time:

Description

User Story

To avoid things happening like two weeks ago again, we want to have a permanent solution for running our s390x z/VM tests on a stable environment, e.g. the new storage and (later) on our new mainframe system (it's easily migrateable)

Acceptance Criteria

AC1: set up z/VM 6.3 (or even 6.4) on an LPAR running on our zEC12 Mainframe with disks on the DS8870
AC2: configure guests according to the requirements of QA SLE openQA automation tests on z/VM
AC3: set up guests according to additional requirements for manual testing on z/VM
AC4: integrate new system in our openQA environment


Related issues

Blocks openQA Infrastructure - action #68872: job age max exceeds alarm thresholdResolved2020-07-122020-09-02

History

#1 Updated by mgriessmeier almost 4 years ago

most likely [epic] - to be discussed in sprint planning

#2 Updated by okurz almost 4 years ago

  • Project changed from SUSE QA to openQA Tests
  • Category set to Infrastructure
  • Target version set to Milestone 15

#3 Updated by okurz almost 4 years ago

  • Priority changed from Normal to High

It sounds like the old storage will die down soon - at least according to Wolfgang and Ihno so let's go ahead.

#4 Updated by mgriessmeier almost 4 years ago

  • Subject changed from [sles][functional][s390x][infrastructure] set up dedicated z/VM for (open)QA on our new storage system to [sles][functional][s390x][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system
  • Status changed from New to Workable

#5 Updated by mgriessmeier almost 4 years ago

  • Status changed from Workable to Feedback

waiting for a LPAR from s390-Admin team to perform the installation there

#6 Updated by mgriessmeier almost 4 years ago

  • Due date changed from 2018-03-27 to 2018-04-10

mgriessmeier wrote:

waiting for a LPAR from s390-Admin team to perform the installation there

apparently not happening this week anymore... shifting to next sprint

#7 Updated by mgriessmeier almost 4 years ago

  • Subject changed from [sles][functional][s390x][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system to [sles][functional][s390x][u][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system

#8 Updated by mgriessmeier almost 4 years ago

no feedback from Ihno/Gerhard/Wolfgang until now

#9 Updated by mgriessmeier almost 4 years ago

  • Due date changed from 2018-04-10 to 2018-04-24

#10 Updated by okurz almost 4 years ago

1 month ago you were explaining to mee that this is critical so do the other guys not share your concerns or what's going on? How high is the risk that SLE15 GMC validation will be impacted?

#11 Updated by mgriessmeier almost 4 years ago

okurz wrote:

1 month ago you were explaining to mee that this is critical so do the other guys not share your concerns or what's going on? How high is the risk that SLE15 GMC validation will be impacted?

That's how it sounded to me...
I was asking 3 times in the last month but no one had any new information.
I'd be fine with just dropping this and if it pops up, add it as fast track

#12 Updated by okurz almost 4 years ago

  • Subject changed from [sles][functional][s390x][u][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system to [sle][functional][s390x][u][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system
  • Due date deleted (2018-04-24)
  • Target version changed from Milestone 15 to Milestone 17

ok, off we go into the danger zone !

#13 Updated by mgriessmeier over 3 years ago

  • Status changed from Feedback to Blocked
  • Priority changed from High to Normal
  • Target version changed from Milestone 17 to Milestone 21+

Blocked by limited resources... will be important when the new mainframe is set up
so moving to M21+ but keeping me assigned for tracking
lowering priority though

#14 Updated by okurz over 3 years ago

  • Target version changed from Milestone 21+ to Milestone 21+

#15 Updated by coolo over 3 years ago

  • Project changed from openQA Tests to openQA Infrastructure
  • Category deleted (Infrastructure)

#16 Updated by okurz about 3 years ago

  • Status changed from Blocked to Workable
  • Target version changed from Milestone 21+ to Milestone 24

new mainframe should be there.

#17 Updated by okurz almost 3 years ago

mgriessmeier how would you see the current state, still "Workable"? Assigned to you or unassign?

#18 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 24 to Milestone 26

current state is that I didn't have time

#19 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 26 to Milestone 27

actually in progress (blocked by infra atm) - though not urgent

#20 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 27 to Milestone 29

postponing

#21 Updated by mgriessmeier about 2 years ago

  • Target version changed from Milestone 29 to Milestone 31

finishing this in january

#22 Updated by maritawerner almost 2 years ago

just out of curiosity: is that done now? I know that we will get new HW in April for Live Patching as well.

#23 Updated by mgriessmeier almost 2 years ago

nope, sorry, we still struggle to integrate it in the QA network.
but this is not related to Live patching HW

#24 Updated by SLindoMansilla over 1 year ago

  • Subject changed from [sle][functional][s390x][u][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system to [sle][s390x][infrastructure][hard] set up dedicated z/VM for (open)QA on our new storage system

#25 Updated by mgriessmeier over 1 year ago

  • Status changed from Workable to In Progress
  • Priority changed from Normal to Urgent

right now we face an issue with disks on the openQA z/VM workers.
luckily I am almost finished with the new setup.

Anyway, raising to urgent, cause all zVM tests are failing atm

#27 Updated by mgriessmeier over 1 year ago

  • Status changed from In Progress to Feedback
  • Priority changed from Urgent to Normal

the 6 original workers were replaced and verified.

next steps: adding more resources

please let me know if any errors still occur

normalizing priority

#28 Updated by okurz over 1 year ago

mgriessmeier I think you removed all workers matching the still existing machine "s390x-zVM-hsi-l3" and a lot of tests scheduled against this machine are now piling up, e.g. see https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?orgId=1&fullscreen&panelId=4&from=now-7d&to=now and the schedule on https://openqa.suse.de/tests/ . Can you suggest how to handle these scenarios? Should s390x-zVM-hsi-l3 still exist or is it obsolete? If the first, can you readd according classes, if the latter, please crosscheck with test maintainers what the equivalent worker classes would be. Also I suggest to remove machine classes that have no matching worker class behind.

#29 Updated by okurz over 1 year ago

  • Priority changed from Normal to Urgent

#30 Updated by mgriessmeier over 1 year ago

okurz wrote:

mgriessmeier I think you removed all workers matching the still existing machine "s390x-zVM-hsi-l3" and a lot of tests scheduled against this machine are now piling up, e.g. see https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?orgId=1&fullscreen&panelId=4&from=now-7d&to=now and the schedule on https://openqa.suse.de/tests/ . Can you suggest how to handle these scenarios? Should s390x-zVM-hsi-l3 still exist or is it obsolete? If the first, can you readd according classes, if the latter, please crosscheck with test maintainers what the equivalent worker classes would be. Also I suggest to remove machine classes that have no matching worker class behind.

Hi Olli,

so hsi and ctc worker classes are deprecated now (as I also mentioned in my mail to the openqa mailinglist.

I disabled jobs with those worker classes for the Yast and the Functional group and was mentioning that those should be done for others as well.
But I will crosscheck still running scenarios and ensure to disable them.

#31 Updated by mgriessmeier over 1 year ago

  • Priority changed from Urgent to High

ondrej sukup from QAM will check tomorrow and adapt their schedule - they have public holiday today

#32 Updated by okurz over 1 year ago

Thanks, sound good. IMHO better this way that we crosscheck that no unintended test coverage drops :) We should re-enable the alert for "job schedule age" in https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?orgId=1&panelId=5&fullscreen&edit&tab=alert after we have ensured that there are no more jobs scheduled against machines with worker classes that do not exist anymore and also that according machine definitions do not even exist anymore.

#33 Updated by okurz over 1 year ago

  • Blocks action #68872: job age max exceeds alarm threshold added

#34 Updated by okurz over 1 year ago

mgriessmeier osukup any progress? I currently do not see any stuck tests but machines like s390x-zVM-hsi-l3 still exist.

#35 Updated by okurz over 1 year ago

mgriessmeier you are still the assignee for this ticket but I assume you do not actually plan any further steps for it. Could you assign to osukup and ask for the remaining steps to be done or will you do?

#36 Updated by mgriessmeier over 1 year ago

  • Assignee changed from mgriessmeier to osukup
  • Priority changed from High to Normal

Hi Ondrej,

could you please take care to remove remaining s390x-hsi and s390x-ctc leftovers.
I can't commit to that in the next few days.

thanks a lot

#37 Updated by okurz over 1 year ago

Trying to remove the machine definitions from OSD:

s390x-zVM-ctc   s390x   INSTALLONLY=1
S390_NETWORK_PARAMS=CTCProtocol=0 InstNetDev=ctc HostIP=10.161.189.@S390_HOST@ Hostname=s390ctc@S390_HOST@.suse.de Gateway=10.161.189.254 Nameserver=10.160.0.1 Domain=suse.de ReadChannel=0.0.0600 WriteChannel=0.0.0601 Pointtopoint=10.161.189.254
WORKER_CLASS=s390x-zVM-ctc

yields

Bad Request: Group SLE 12 SP5 YaST must be updated through the YAML template

and trying to remove

s390x-zVM-hsi-l2    s390x   INSTALLONLY=1
S390_NETWORK_PARAMS=OSAMedium=eth OSAInterface=qdio OSAHWAddr= InstNetDev=osa HostIP=10.161.183.@S390_HOST@/24 Hostname=s390hsl@S390_HOST@.suse.de Gateway=10.161.183.254 Nameserver=10.160.0.1 Domain=suse.de PortNo=0 Layer2=1 ReadChannel=0.0.7100 WriteChannel=0.0.7101 DataChannel=0.0.7102 Portname=trash
WORKAROUND_BUGS=bsc1156047
WORKER_CLASS=s390x-zVM-hsi-l2

s390x-zVM-hsi-l3    s390x   INSTALLONLY=1
S390_NETWORK_PARAMS=Portname=trash InstNetDev=osa OSAInterface=qdio OSAMedium=eth HostIP=10.161.185.@S390_HOST@/24 Hostname=s390hsi@S390_HOST@.suse.de Gateway=10.161.185.254 Nameserver=10.160.0.1 Domain=suse.de PortNo=0 Layer2=0 ReadChannel=0.0.7000 WriteChannel=0.0.7001 DataChannel=0.0.7002
WORKER_CLASS=s390x-zVM-hsi-l3

yields:

Bad Request: Group SLE 12 SP5 YaST must be updated through the YAML templateBad Request: Groups Maintenance: SLE 12 SP1 Incidents, SLE 12 SP5 YaST must be updated through the YAML template

Unfortunately there was no update from osukup (also no answer to chat messages).

EDIT: Managed to remove the ctc reference after deleting the last job template from https://openqa.suse.de/admin/job_templates/142 . The file says it is managed in git but actually the SLE12 files have been removed from https://gitlab.suse.de/qsf-y/qa-sle-functional-y in a commit which is just stating the obvious, not explaining why. I assume someone wanted to cleanup.

Also removed

    - qam-minimal:
        machine: s390x-zVM-hsi-l3
        settings:
          PATTERNS: 'minimal'

from https://openqa.suse.de/admin/job_templates/41 "Maintenance: SLE 12 SP1 Incidents" so I could delete both machines as well.

#38 Updated by okurz over 1 year ago

  • Status changed from Feedback to Resolved
  • Assignee changed from osukup to okurz

As mgriessmeier assigned to osukup for the last cleanup task which I did now I guess we can call this ticket Resolved completely

Also available in: Atom PDF