action #51836
closedManage (parts) of s390 kvm instances (formerly s390p7 and s390p8) with salt
Added by nicksinger over 5 years ago. Updated about 1 year ago.
0%
Description
Recent example: s390p8 ran out of space because the openQA worker don't depend/wait for the mount of a separate FCP storage disk. This could be avoided by deploying a machine specific openqa-worker unit override (similar to the auto-format on aarch64 workers). To manage these kind of specialties and to uniform our setup it would make sense to include these two hosts into our openQA salt.
Updated by nicksinger over 5 years ago
- Copied from action #51833: [tools][functional][u] s390p8 is out-of-space in / added
Updated by nicksinger over 5 years ago
- Copied from deleted (action #51833: [tools][functional][u] s390p8 is out-of-space in /)
Updated by nicksinger over 5 years ago
- Related to action #51833: [tools][functional][u] s390p8 is out-of-space in / added
Updated by szarate over 5 years ago
- Subject changed from [tools][functional][u] Manage (parts) of s390p7 and s390p8 with salt to [tools] Manage (parts) of s390p7 and s390p8 with salt
This does not belong on the [u] team :) At least from a brief chat with Matthias
Updated by okurz about 4 years ago
- Subject changed from [tools] Manage (parts) of s390p7 and s390p8 with salt to Manage (parts) of s390p7 and s390p8 with salt
- Status changed from New to Feedback
- Assignee set to mgriessmeier
- Priority changed from High to Normal
- Target version set to Ready
szarate wrote:
This does not belong on the [u] team :) At least from a brief chat with Matthias
Doesn't belong to "[tools]" either according to https://progress.opensuse.org/projects/qa/wiki/Wiki#Out-of-scope
@mgriessmeier I think you can help clarify responsibilities here as:
- I understand that QSF-u wants to have less responsibilities with infrastructure
- You are still an expert in this topic but also in a different position where you can better clarify responsibilities
But please also keep in mind the recently more limited capacity of https://confluence.suse.com/display/openqa/openQA#openQA-Team
Updated by mgriessmeier about 4 years ago
hey,
so I can understand both of your sides - and yeah, one can probably argue if this is "Administration of workers" or "maintenance of workers".
but since this is lying around for such a long time, it seems that apparently the annoyance of this not being done is not high enough (yet).
I don't see a crystal-clear responsibility for that task here, so what about the following proposal:
do a joint collaboration between QSF and Tools team - sit down for few hours and do this thing together? I'm happy to help here of course =)
Updated by okurz about 4 years ago
mgriessmeier wrote:
since this is lying around for such a long time, it seems that apparently the annoyance of this not being done is not high enough (yet).
Yes, I think so. This is why I lowered from "High" to "Normal" priority.
I don't see a crystal-clear responsibility for that task here, so what about the following proposal:
do a joint collaboration between QSF and Tools team - sit down for few hours and do this thing together? I'm happy to help here of course =)
Well of course everybody is happy to offer help but this task lying around for more than a year already shows that this is not problem. Just an outlook into the future if nothing is done here: Eventually the machines will break down or need to be replaced (again) and everybody will just point to mgriessmeier who should know best ;)
Updated by okurz about 4 years ago
- Target version changed from Ready to future
I am trying to make it more obvious that with the current team's capacity and capabilities this is unlikely to be worked on by SUSE QA Tools hence setting the target version accordingly to "future".
@mgriessmeier considering that we just had problems with s390pb these days (I think even twice) I suggest to consider this issue more important to save confusion and therefore cost and confusion in the future again.
Updated by nicksinger over 2 years ago
- Status changed from Feedback to Blocked
Currently blocked by https://bugzilla.suse.com/show_bug.cgi?id=1198485 anyway
Updated by okurz over 2 years ago
Well, if SUSE QE Tools needs to maintain these machines then we should aim to migrate to openSUSE Leap 15.3 anyway
Updated by okurz over 2 years ago
- Related to action #109969: s390zp19 - out of disk space added
Updated by okurz over 1 year ago
- Related to action #127754: osd nfs-server needed to be restarted but we got no alerts size:M added
Updated by okurz about 1 year ago
- Subject changed from Manage (parts) of s390p7 and s390p8 with salt to Manage (parts) of s390 kvm instances (formerly s390p7 and s390p8) with salt
- Status changed from Blocked to New
#127337 brings me back to this. https://bugzilla.suse.com/show_bug.cgi?id=1198485#c1 is RESOLVED INVALID and also we should not be blocked on this. Instead we should only support current Leap same as we do for other s390 related machines.
Updated by okurz about 1 year ago
- Related to action #127337: Some s390x workers have been failing for all jobs since 11 months ago added
Updated by mgriessmeier about 1 year ago
- s390zl12.oqa.prg2.suse.org and s390zl13.oqa.prg2.suse.org are both running openSUSE Leap 15.5
There are some steps necessary to set up before it can be used to execute s390x KVM jobs.
- Packages that need to be present:
- multipath-tools
- libvirt
- directories
- /var/lib/openqa/share/factory
- /var/lib/libvirt/images
- services
- libvirtd
- multipathd
- ZFCP disk for storing images
- cio_ignore -r [fc00,fa00] to whitelist the channels
- zfcp_host_configure [fa00,fc00] 1 to permanently enable the fcp devices
- multipath -ll to check what devices are there
- /usr/bin/rescan-iscsi-bus.sh to discover newly add ed zfcp disks
- fdisk to create new partition
- mkfs.ext4 to create file system
- /etc/fstab entries
- NFS openQA:
openqa.suse.de:/var/lib/openqa/share/factory /var/lib/openqa/share/factory nfs ro 0 0
- ZFCP disk:
/dev/mapper/$ID /var/lib/libvirt/images ext4 nobarrier,data=writeback 1
- NFS openQA:
- crontab -e
0 */1 * * * /usr/local/bin/cleanup-openqa-assets >/dev/null
~# cat /usr/local/bin/cleanup-openqa-assets
#!/bin/sh -e
echo "--- cronjob start ---"
if [[ $(df | grep "/var/lib/libvirt/images" | awk '{print $5}' | sed "s/\%//") -gt 70 ]] ; then
echo "--- entering if ---";
find /var/lib/libvirt/images/*.qcow2 ! -exec fuser -s "{}" 2>/dev/null \; -exec rm -f {} \; -print
fi
echo "--- cronjob end ---"
Updated by okurz about 1 year ago
Ok, let's start somewhere: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1010
Updated by okurz about 1 year ago
- Status changed from New to In Progress
- Assignee changed from mgriessmeier to okurz
- Target version changed from future to Ready
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1010 merged. On s390zl13 echo 'roles: libvirt' >> /etc/salt/grains
and on osd
sudo salt --no-color --state-output=changes 's390zl*' saltutil.refresh_grains,state.test , | grep -v 'Result.*Clean'
looks clean. Now trying to add more into the libvirt salt states.
Updated by okurz about 1 year ago
I included installation instructions from #51836-15 in https://progress.opensuse.org/projects/openqav3/wiki/Wiki#s390-LPAR-setup for things that are not feasible or not possible to do via salt but need to be done during installation.
Updated by okurz about 1 year ago
- Due date set to 2023-10-25
- Status changed from In Progress to Feedback
Updated by okurz about 1 year ago
- Due date deleted (
2023-10-25) - Status changed from In Progress to Resolved
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1013 merged and deployed, s390zl12+13 are now properly part of our salt-controlled infrastructure.