Project

General

Profile

Actions

action #178972

closed

[s390x][s390zl13][tools] nfs mount to openqa.suse.de is missing size:S

Added by rfan1 about 1 month ago. Updated 29 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
2025-03-17
Due date:
% Done:

0%

Estimated time:

Description

Observation

s390zl13:~ # cat /etc/fstab 
/dev/disk/by-path/ccw-0.0.6000-part2  /                        btrfs  defaults                        0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /var                     btrfs  subvol=/@/var                   0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /usr/local               btrfs  subvol=/@/usr/local             0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /tmp                     btrfs  subvol=/@/tmp                   0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /srv                     btrfs  subvol=/@/srv                   0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /root                    btrfs  subvol=/@/root                  0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /opt                     btrfs  subvol=/@/opt                   0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /home                    btrfs  subvol=/@/home                  0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /boot/grub2/s390x-emu    btrfs  subvol=/@/boot/grub2/s390x-emu  0  0
/dev/disk/by-path/ccw-0.0.6000-part2  /.snapshots              btrfs  subvol=/@/.snapshots            0  0
/dev/disk/by-path/ccw-0.0.6000-part1  /boot/zipl               ext2   defaults                        0  2
/dev/disk/by-path/ccw-0.0.6000-part3  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.6002-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.6003-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.6004-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.6005-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.6006-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.6007-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.6008-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.6009-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.600a-part1  swap                     swap   sw,pri=2                        0  0
/dev/disk/by-path/ccw-0.0.600b-part1  swap                     swap   sw,pri=2                        0  0

#ZFCP Disk for libvirt images
/dev/mapper/3600507638081855cd80000000000004c-part1                /var/lib/libvirt/images        ext4        rw,nobarrier,data=writeback        0 0

#openQA NFS share
openqa.oqa.prg2.suse.org:/var/lib/openqa/share/factory  /var/lib/openqa/share/factory  nfs    ro,nolock                              0  0
s390zl13:~ # df |grep /var/lib/openqa/share/factory
s390zl13:~ # 

openQA test in scenario sle-15-SP6-Server-DVD-Updates-s390x-mau-bind@s390x-kvm fails in
bootloader_zkvm

Test suite description

Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml.

Reproducible

Fails since (at least) Build 20250316-1 (current job)

Expected result

Last good: 20250314-1 (or more recent)

Further details

Always latest result in this scenario: latest

Suggestions


Related issues 1 (1 open0 closed)

Related to openQA Infrastructure (public) - action #179032: machine netboot.qe.prg2.suse.org can randomly fail "srv-tftpboot-mnt-openqa.mount"-unitNew2025-03-17

Actions
Actions #1

Updated by rfan1 about 1 month ago

  • Subject changed from [s390x][s390zl13] nfs mount to openqa.suse.de is missing to [s390x][s390zl13][tools] nfs mount to openqa.suse.de is missing
Actions #2

Updated by okurz about 1 month ago

  • Tags set to infra, reactive work, s390x, osd
  • Project changed from openQA Tests (public) to openQA Infrastructure (public)
  • Category changed from Bugs in existing tests to Regressions/Crashes
  • Priority changed from Normal to High
  • Target version set to Ready
Actions #3

Updated by mkittler about 1 month ago

  • Status changed from New to Feedback
  • Assignee set to mkittler

This should work again after running sudo systemctl restart var-lib-openqa-share-factory.mount on s390zl13.oqa.prg2.suse.org. I also checked s390zl12.oqa.prg2.suse.org but it wasn't affected.

Actions #4

Updated by mkittler about 1 month ago

Those were the logs (including the restart I did on Mar 17 11:17:31):

martchus@s390zl13:~> sudo journalctl -fu var-lib-openqa-share-factory.mount
Mar 16 03:34:25 s390zl13 systemd[1]: Mounting /var/lib/openqa/share/factory...
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Mounting timed out. Terminating.
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Mount process exited, code=killed, status=15/TERM
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Failed with result 'timeout'.
Mar 16 03:35:39 s390zl13 systemd[1]: Failed to mount /var/lib/openqa/share/factory.
Mar 17 11:17:31 s390zl13 systemd[1]: Mounting /var/lib/openqa/share/factory...
Mar 17 11:17:31 s390zl13 systemd[1]: Mounted /var/lib/openqa/share/factory.

I guess we can easily improve this by using our automatic recovery on those hosts as well: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1406

Actions #5

Updated by nicksinger about 1 month ago

  • Related to action #179032: machine netboot.qe.prg2.suse.org can randomly fail "srv-tftpboot-mnt-openqa.mount"-unit added
Actions #6

Updated by nicksinger about 1 month ago

mkittler wrote in #note-4:

Those were the logs (including the restart I did on Mar 17 11:17:31):

martchus@s390zl13:~> sudo journalctl -fu var-lib-openqa-share-factory.mount
Mar 16 03:34:25 s390zl13 systemd[1]: Mounting /var/lib/openqa/share/factory...
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Mounting timed out. Terminating.
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Mount process exited, code=killed, status=15/TERM
Mar 16 03:35:39 s390zl13 systemd[1]: var-lib-openqa-share-factory.mount: Failed with result 'timeout'.
Mar 16 03:35:39 s390zl13 systemd[1]: Failed to mount /var/lib/openqa/share/factory.
Mar 17 11:17:31 s390zl13 systemd[1]: Mounting /var/lib/openqa/share/factory...
Mar 17 11:17:31 s390zl13 systemd[1]: Mounted /var/lib/openqa/share/factory.

I guess we can easily improve this by using our automatic recovery on those hosts as well: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1406

I think the machine does not have the worker-grain, right? Because the symptom looks pretty much the same as I found yesterday on netboot: https://progress.opensuse.org/issues/179032

Actions #7

Updated by okurz about 1 month ago

  • Subject changed from [s390x][s390zl13][tools] nfs mount to openqa.suse.de is missing to [s390x][s390zl13][tools] nfs mount to openqa.suse.de is missing size:S
  • Description updated (diff)
Actions #8

Updated by okurz about 1 month ago

  • Status changed from Feedback to Workable
  • Priority changed from High to Urgent

Multiple alerts from yesterday and today, please consider mitigations like silencing alerts.

Actions #9

Updated by mkittler about 1 month ago

  • Status changed from Workable to Resolved
  • Priority changed from Urgent to High

Can you be more specific? There were many alerts but I don't think any of them has something to do with this ticket. The only currently firing alert is "backup-vm: partitions usage (%) alert" which is definitely unrelated. Considering the problem was fixed and also https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1406 was merged as additional improvement I'm actually considering this ticket resolved.

Actions #10

Updated by okurz 30 days ago

  • Status changed from Resolved to Workable

The salt CI pipelines never succeeded in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1406 and now in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/4001161 I see

unreal6.qe.nue2.suse.org:
----------
          ID: wakeup_automount
    Function: cmd.run
        Name: ls -d /var/lib/openqa/share/.
      Result: True
     Comment: Command "ls -d /var/lib/openqa/share/." run
     Started: 20:34:42.032213
    Duration: 319.81 ms
     Changes:   
              ----------
              pid:
                  27389
              retcode:
                  0
              stderr:
              stdout:
                  /var/lib/openqa/share/.
----------
          ID: /var/lib/openqa/share
    Function: mount.mounted
      Result: False
     Comment: mount.nfs: Connection refused for openqa.suse.de:/var/lib/openqa/share on /var/lib/openqa/share
     Started: 20:34:42.357876
    Duration: 1806440.625 ms
     Changes:   

probably also other hosts affected

Actions #11

Updated by nicksinger 30 days ago

  • Assignee changed from mkittler to nicksinger

taking over to see what needs to be changed here

Actions #12

Updated by nicksinger 30 days ago

  • Status changed from Workable to In Progress
Actions #13

Updated by nicksinger 30 days ago

  • Status changed from In Progress to Feedback

The selected libvirt-role covered too many hosts:

openqa:~ # salt -C 'G@roles:libvirt' test.ping
s390zl12.oqa.prg2.suse.org:
    True
s390zl13.oqa.prg2.suse.org:
    True
ada.qe.prg2.suse.org:
    True
unreal6.qe.nue2.suse.org:
    True
worker39.oqa.prg2.suse.org:
    True

external_openqa_hypervisor is what we actually want:

openqa:~ # salt -C 'G@roles:external_openqa_hypervisor' test.ping
s390zl12.oqa.prg2.suse.org:
    True
s390zl13.oqa.prg2.suse.org:
    True

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1408 to fix that

Actions #14

Updated by nicksinger 30 days ago

  • Status changed from Feedback to Workable
  • Assignee changed from nicksinger to mkittler

MR merged and effective despite the pipeline still failing (for something else). I manually adjusted /etc/fstab on zl13 and removed the old mount from there.

Actions #15

Updated by mkittler 29 days ago

  • Status changed from Workable to Resolved

Thanks. I checked the mount and fstab file on both s390x hosts again and I think we're good.

Actions

Also available in: Atom PDF