Project

General

Profile

Actions

action #51836

closed

Manage (parts) of s390 kvm instances (formerly s390p7 and s390p8) with salt

Added by nicksinger almost 5 years ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2019-05-22
Due date:
% Done:

0%

Estimated time:

Description

Recent example: s390p8 ran out of space because the openQA worker don't depend/wait for the mount of a separate FCP storage disk. This could be avoided by deploying a machine specific openqa-worker unit override (similar to the auto-format on aarch64 workers). To manage these kind of specialties and to uniform our setup it would make sense to include these two hosts into our openQA salt.


Related issues 4 (0 open4 closed)

Related to openQA Infrastructure - action #51833: [tools][functional][u] s390p8 is out-of-space in /Resolvedmgriessmeier2019-05-22

Actions
Related to openQA Infrastructure - action #109969: s390zp19 - out of disk spaceResolvednicksinger2022-04-14

Actions
Related to openQA Infrastructure - action #127754: osd nfs-server needed to be restarted but we got no alerts size:MResolvednicksinger

Actions
Related to openQA Infrastructure - action #127337: Some s390x workers have been failing for all jobs since 11 months agoResolvedokurz2023-04-06

Actions
Actions #1

Updated by nicksinger almost 5 years ago

  • Copied from action #51833: [tools][functional][u] s390p8 is out-of-space in / added
Actions #2

Updated by nicksinger almost 5 years ago

  • Copied from deleted (action #51833: [tools][functional][u] s390p8 is out-of-space in /)
Actions #3

Updated by nicksinger almost 5 years ago

  • Related to action #51833: [tools][functional][u] s390p8 is out-of-space in / added
Actions #4

Updated by szarate almost 5 years ago

  • Subject changed from [tools][functional][u] Manage (parts) of s390p7 and s390p8 with salt to [tools] Manage (parts) of s390p7 and s390p8 with salt

This does not belong on the [u] team :) At least from a brief chat with Matthias

Actions #5

Updated by okurz over 3 years ago

  • Subject changed from [tools] Manage (parts) of s390p7 and s390p8 with salt to Manage (parts) of s390p7 and s390p8 with salt
  • Status changed from New to Feedback
  • Assignee set to mgriessmeier
  • Priority changed from High to Normal
  • Target version set to Ready

szarate wrote:

This does not belong on the [u] team :) At least from a brief chat with Matthias

Doesn't belong to "[tools]" either according to https://progress.opensuse.org/projects/qa/wiki/Wiki#Out-of-scope

@mgriessmeier I think you can help clarify responsibilities here as:

  • I understand that QSF-u wants to have less responsibilities with infrastructure
  • You are still an expert in this topic but also in a different position where you can better clarify responsibilities

But please also keep in mind the recently more limited capacity of https://confluence.suse.com/display/openqa/openQA#openQA-Team

Actions #6

Updated by mgriessmeier over 3 years ago

hey,

so I can understand both of your sides - and yeah, one can probably argue if this is "Administration of workers" or "maintenance of workers".
but since this is lying around for such a long time, it seems that apparently the annoyance of this not being done is not high enough (yet).

I don't see a crystal-clear responsibility for that task here, so what about the following proposal:
do a joint collaboration between QSF and Tools team - sit down for few hours and do this thing together? I'm happy to help here of course =)

Actions #7

Updated by okurz over 3 years ago

mgriessmeier wrote:

since this is lying around for such a long time, it seems that apparently the annoyance of this not being done is not high enough (yet).

Yes, I think so. This is why I lowered from "High" to "Normal" priority.

I don't see a crystal-clear responsibility for that task here, so what about the following proposal:
do a joint collaboration between QSF and Tools team - sit down for few hours and do this thing together? I'm happy to help here of course =)

Well of course everybody is happy to offer help but this task lying around for more than a year already shows that this is not problem. Just an outlook into the future if nothing is done here: Eventually the machines will break down or need to be replaced (again) and everybody will just point to mgriessmeier who should know best ;)

Actions #8

Updated by okurz over 3 years ago

  • Target version changed from Ready to future

I am trying to make it more obvious that with the current team's capacity and capabilities this is unlikely to be worked on by SUSE QA Tools hence setting the target version accordingly to "future".

@mgriessmeier considering that we just had problems with s390pb these days (I think even twice) I suggest to consider this issue more important to save confusion and therefore cost and confusion in the future again.

Actions #9

Updated by nicksinger almost 2 years ago

  • Status changed from Feedback to Blocked
Actions #10

Updated by okurz almost 2 years ago

Well, if SUSE QE Tools needs to maintain these machines then we should aim to migrate to openSUSE Leap 15.3 anyway

Actions #11

Updated by okurz almost 2 years ago

Actions #12

Updated by okurz 11 months ago

  • Related to action #127754: osd nfs-server needed to be restarted but we got no alerts size:M added
Actions #13

Updated by okurz 6 months ago

  • Subject changed from Manage (parts) of s390p7 and s390p8 with salt to Manage (parts) of s390 kvm instances (formerly s390p7 and s390p8) with salt
  • Status changed from Blocked to New

#127337 brings me back to this. https://bugzilla.suse.com/show_bug.cgi?id=1198485#c1 is RESOLVED INVALID and also we should not be blocked on this. Instead we should only support current Leap same as we do for other s390 related machines.

Actions #14

Updated by okurz 6 months ago

  • Related to action #127337: Some s390x workers have been failing for all jobs since 11 months ago added
Actions #15

Updated by mgriessmeier 6 months ago

  • s390zl12.oqa.prg2.suse.org and s390zl13.oqa.prg2.suse.org are both running openSUSE Leap 15.5

There are some steps necessary to set up before it can be used to execute s390x KVM jobs.

  • Packages that need to be present:
    • multipath-tools
    • libvirt
  • directories
    • /var/lib/openqa/share/factory
    • /var/lib/libvirt/images
  • services
    • libvirtd
    • multipathd
  • ZFCP disk for storing images
    • cio_ignore -r [fc00,fa00] to whitelist the channels
    • zfcp_host_configure [fa00,fc00] 1 to permanently enable the fcp devices
    • multipath -ll to check what devices are there
    • /usr/bin/rescan-iscsi-bus.sh to discover newly add ed zfcp disks
    • fdisk to create new partition
    • mkfs.ext4 to create file system
  • /etc/fstab entries
    • NFS openQA: openqa.suse.de:/var/lib/openqa/share/factory /var/lib/openqa/share/factory nfs ro 0 0
    • ZFCP disk: /dev/mapper/$ID /var/lib/libvirt/images ext4 nobarrier,data=writeback 1
  • crontab -e
    • 0 */1 * * * /usr/local/bin/cleanup-openqa-assets >/dev/null
~# cat /usr/local/bin/cleanup-openqa-assets
#!/bin/sh -e
echo "--- cronjob start ---"
if [[ $(df | grep "/var/lib/libvirt/images" | awk '{print $5}' | sed "s/\%//") -gt 70 ]] ; then
    echo "--- entering if ---"; 
    find /var/lib/libvirt/images/*.qcow2 ! -exec fuser -s "{}" 2>/dev/null \; -exec rm -f {} \; -print
fi
echo "--- cronjob end ---"
Actions #17

Updated by okurz 6 months ago

  • Status changed from New to In Progress
  • Assignee changed from mgriessmeier to okurz
  • Target version changed from future to Ready

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1010 merged. On s390zl13 echo 'roles: libvirt' >> /etc/salt/grains and on osd

sudo salt --no-color --state-output=changes 's390zl*' saltutil.refresh_grains,state.test , | grep -v 'Result.*Clean'

looks clean. Now trying to add more into the libvirt salt states.

Actions #18

Updated by okurz 6 months ago

I included installation instructions from #51836-15 in https://progress.opensuse.org/projects/openqav3/wiki/Wiki#s390-LPAR-setup for things that are not feasible or not possible to do via salt but need to be done during installation.

Actions #19

Updated by okurz 6 months ago

  • Due date set to 2023-10-25
  • Status changed from In Progress to Feedback
Actions #20

Updated by okurz 6 months ago

  • Status changed from Feedback to In Progress
Actions #21

Updated by okurz 6 months ago

  • Due date deleted (2023-10-25)
  • Status changed from In Progress to Resolved

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1013 merged and deployed, s390zl12+13 are now properly part of our salt-controlled infrastructure.

Actions

Also available in: Atom PDF