Project

General

Profile

Actions

action #166748

open

[MinimalVM] VMware images not handling hdd subfoldes

Added by mdati 4 months ago. Updated 2 months ago.

Status:
Workable
Priority:
Low
Assignee:
-
Target version:
-
Start date:
2024-09-12
Due date:
% Done:

0%

Estimated time:

Description

VMware jobs fail to include images from the hdd/fixed folder, although it works in some cases when the assets are already present on the worker.

The main problems arises in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/31dd5c1676685a016198cb1adaac265ef5b48be5/tests/installation/bootloader_svirt.pm#L143 and the following LOC, where handling of subfolders is not properly implemented. This should be done in the backend.

Original ticket

This poo is a follow up of issues noted in poo#162941: note-14, note-17, note-24.

In particular:

A) during SL Micro 6.0 Product Increments -VMware tests run, in osado-bootloader_svirt.pm execution, the vmware image full-path is not transferred to the routines managing that file, but only the basename and the path is statically recomposed , assuming no subfoldes for hdd: but those images are in hdd/fixed, instead.

Therefore the bash snippet fails to find the source file, when the basename is not already present in the expected destination folder of the SUT.

B) Moreover, in next runs of those tests, bootloader_svirt.pm passed ok, like 15400838, because the named snippet found the vmware image already in the right place(here transferred by some unknown or manual operation), skipping the copy command.

But leaving that image in the expected place, never cleaned, eventual images update from new builds would be never transferred, therefore not tested.

A possible fix is, sequentially applying the steps:

  1. update the code in (A) ensuring that the right full-path image is provided as origin in _copy_image_vmware.
  2. Define a lock-file policy for the VMware image (using pre/post_run), to allow transferring that file but preventing that it is cleaned or ovewritten by other similar running tests during elaboration.
  3. Update the cleanup, like in item n.2 of note-14.

Related issues 1 (0 open1 closed)

Related to Containers and images - action #162941: Add job group definitions for SLEM 6.0 to QAC-yamlResolvedmdati2024-06-27

Actions
Actions #1

Updated by mdati 4 months ago

  • Assignee set to mdati

With reference to case (A), the subroutine calls are:
bootloader_svirt.pm
-> add_disk($self, $args): @args contains also original image full-path, but not used;
-> _copy_image_to_vm_host($args,...): @args still contains original image full-path-file, but basename only is transferred.
-> _copy_image_vmware(...,$file_basename,...): the only basename is received, the image path is hard-coded and partially re-calculated, without any subfolder management.
Therefore images in hdd/ (or iso/) subfolders are not correctly managed by the copy command.

A fix for item (1) has been proposed in PR https://github.com/os-autoinst/os-autoinst/pull/2542.

Actions #2

Updated by mdati 4 months ago

  • Description updated (diff)
Actions #3

Updated by ph03nix 4 months ago

  • Project changed from 216 to Containers and images

I don't yet understand this ticket but I will try again to make sense out of this.

Actions #4

Updated by ph03nix 4 months ago

Notes for myself to help me understand what's going on here:

Assets are being copied to VMWARE_DATASTORE. The failure looks to me like a race condition either on the host itself or when multiple workers are involved.

Actions #5

Updated by ph03nix 4 months ago

  • Checklist item deleted (Step #1)
  • Checklist item deleted (Step #2)
  • Checklist item deleted (Step #3)
  • Status changed from New to Blocked

Need to talk to @mloviska when he's back, he know more about the VMware backend.

Actions #6

Updated by ph03nix 4 months ago

  • Related to action #162941: Add job group definitions for SLEM 6.0 to QAC-yaml added
Actions #7

Updated by mdati 4 months ago · Edited

Note to add some more details about poo's issue, in addition to the description in poo header:

in os-autoinst consoles/sshVirtsh.pm the called _copy_image_vmware subroutine starts running this shell script, based on input-parameters, that contain the only file base name, triggering the:
(1)if-block when vmware image file exists, in Datastore:

...
    my $cmd =
      "$ds_debug if test -e $vmware_openqa_datastore$file_basename; then " .
      "while lsof | grep 'cp.*$file_basename'; do " .
      "echo File $file_basename is being copied by other process, sleeping for 60 seconds; sleep 60;" .
      'done;' .

that does nothing, otherwise the (2)else-block:

      'else ' .
      "cp /vmfs/volumes/$vmware_nfs_datastore/$nfs_dir/$file_basename $vmware_openqa_datastore;" .
      'fi;';
...

that should transfer the image file from source to datastore.

In one old test only that file was not-found and (2) block triggered, but being the source file in hdd/fixed/ subfolder, the cp command failed expecting it in hdd/.

For some unclear reason, in all next tests runs the image file was/is always present in the Datastore (may be manually copied), as confirmed using the VMWARE_NFS_DATASTORE_DEBUG=1 enabling set -x, therefore (1) only executed and no matter where the source image is.
Also, the file is never deleted, looking at the cleanup phase structure.

This is due to (A) the image-file parameter passed to the final routine without the original full-path, but only the base-name, (B) file never cleaned.

Therefore, the 3 steps proposed in the poo-header.

Actions #8

Updated by mdati 3 months ago · Edited

In a test cloned adding only VMWARE_NFS_DATASTORE_DEBUG=1 to enable set -x, we can better see a fault like in note#7 (2)-else block, due to image not found:
https://openqa.suse.de/tests/15563772/logfile?filename=autoinst-log.txt#line-266

[2024-09-30T14:44:11.715655Z] [debug] [pid:124900] [run_ssh_cmd(set -x; if test -e /vmfs/volumes/datastore1/openQA/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk; then while lsof | grep 'cp.*SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk'; do echo File SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk is being copied by other process, sleeping for 60 seconds; sleep 60;done;else cp /vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk /vmfs/volumes/datastore1/openQA/;fi;)] stderr:
  + test -e /vmfs/volumes/datastore1/openQA/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk
  + cp /vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk /vmfs/volumes/datastore1/openQA/
  cp: can't stat '/vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk': No such file or directory

It is the same fault named in https://openqa.suse.de/tests/15040545/logfile?filename=autoinst-log.txt#line-507 for
https://openqa.suse.de/tests/15040545/logfile?filename=autoinst-log.txt#line-507, where debug not enabled and we see only last line, the cp error.

This shows why the PR https://github.com/os-autoinst/os-autoinst/pull/2542 is needed.

Actions #9

Updated by ph03nix 3 months ago

  • Project changed from Containers and images to 208
  • Subject changed from VMware images not handling hdd iso subfoldes to VMware images not handling hdd subfoldes
  • Description updated (diff)
  • Status changed from Blocked to Workable
  • Assignee deleted (mdati)
  • Priority changed from Normal to Low

Refining ticket and lowering priority and moving to MinimalVM project.

Actions #10

Updated by ph03nix 3 months ago

  • Tags changed from vmware, svirt to vmware, svirt, to-be-refined

This ticket needs still a bit of refinement to be fully workable.

Actions #11

Updated by mdati 3 months ago

The PR#2542 has been CLOSED

Actions #12

Updated by ph03nix 3 months ago

  • Tags changed from vmware, svirt, to-be-refined to MinimalVM
Actions #13

Updated by ph03nix 2 months ago

  • Project changed from 208 to Containers and images
Actions #14

Updated by ph03nix 2 months ago

  • Subject changed from VMware images not handling hdd subfoldes to [MinimalVM] VMware images not handling hdd subfoldes
Actions

Also available in: Atom PDF