Project

General

Profile

action #109969

s390zp19 - out of disk space

Added by mgriessmeier 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-04-14
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://app.slack.com/client/T02863RC2AC/C02CANHLANP/thread/C02CANHLANP-1649923742.408719

s390x LPAR s390zp19 run out of disk space.

Actions taken:

  • checked if cleanup script /usr/local/bin/cleanup-openqa-assets >/dev/null works as intended -> [DONE]
  • checked why cronjob was not running -> [DONE]
    • observed multiple warnings and reports regarding dangling references to old glibc versions.
    • tried to update the system via zypper, which resulted in a crash of the machine, booting into kernel panic
  • started re-installation of the machine via zhmc
  • Steps to configure the LPAR:
change hostname  /etc/hostname -> s390zp19
install libvirt -> zypper in libvirt
systemctl start multipathd
systemctl enable multipathd
cio_ignore -r fa00
cio_ignore -r fc00
/usr/bin/rescan-scsi-bus.sh
zfcp_host_configure fa00 1
zfcp_host_configure fc00 1

multipath -ll to check if multipath was configured
mkdir -p /var/lib/openqa/share/factory
mkdir -p /var/lib/libvirt/images
fdisk /dev/mapper/...
n -> p -> ... -> w
mkfs.ext4 /dev/mapper/...-part1

modify /etc/fstab:
# libvirt images
/dev/mapper/36005076307ffd3b30000000000000149-part1 /var/lib/libvirt/images ext4 nobarrier,data=writeback 1 0

# openqa nfs
openqa.suse.de:/var/lib/openqa/share/factory /var/lib/openqa/share/factory nfs ro 0 0

copy cleanup script from e.g. s390zp18 /usr/local/bin/cleanup-openqa-assets
crontab -e:
0 */1 * * * /usr/local/bin/cleanup-openqa-assets >/dev/null

Related issues

Related to openQA Infrastructure - action #51836: Manage (parts) of s390p7 and s390p8 with saltBlocked2019-05-22

History

#1 Updated by mgriessmeier 3 months ago

  • Description updated (diff)

#2 Updated by okurz 3 months ago

  • Target version set to Ready

mgriessmeier why do you assign that to nicksinger? That s390x instance is still outside the scope of SUSE QE Tools, right? I am for now adding the ticket to our backlog with "Ready" as you assigned nicksinger who is part of the team SUSE QE Tools but I would prefer if this is handled outside, e.g. QE Core or you.

#3 Updated by mgriessmeier 3 months ago

  • Description updated (diff)
  • Target version deleted (Ready)

#4 Updated by mgriessmeier 3 months ago

okurz wrote:

mgriessmeier why do you assign that to nicksinger? That s390x instance is still outside the scope of SUSE QE Tools, right? I am for now adding the ticket to our backlog with "Ready" as you assigned nicksinger who is part of the team SUSE QE Tools but I would prefer if this is handled outside, e.g. QE Core or you.

because nicksinger is working with me on that right now - and this should be reflected properly :)
he helped me debugging it, while we observed major issues

#5 Updated by mgriessmeier 3 months ago

  • Description updated (diff)

#6 Updated by okurz 3 months ago

  • Status changed from New to In Progress
  • Target version set to Ready

ok, so then it's part of the backlog, fine.

#7 Updated by openqa_review 3 months ago

  • Due date set to 2022-05-02

Setting due date based on mean cycle time of SUSE QE Tools

#8 Updated by okurz 2 months ago

  • Related to action #51836: Manage (parts) of s390p7 and s390p8 with salt added

#9 Updated by okurz 2 months ago

  • Due date deleted (2022-05-02)
  • Status changed from In Progress to Resolved

The original problem was fixed. #51836 is the related ticket about managing the machines properly which we consider out of scope for SUSE QE Tools. I strongly suggest for QE Core too look into #51836 to prevent further problems.

Also available in: Atom PDF