action #162941: Add job group definitions for SLEM 6.0 to QAC-yaml - Containers and images - openSUSE Project Management Tool

Actions

Copy link

action #162941

closed

Add job group definitions for SLEM 6.0 to QAC-yaml

Added by ph03nix 11 months ago. Updated 7 months ago.

Status:

Resolved

Priority:

High

Assignee:

mdati

Target version:

Start date:

2024-06-27

Due date:

% Done:

100%

Estimated time:

Tags:

containers

Description

https://openqa.suse.de/group_overview/566 is a prototype of the upcoming maintenance setup for SLEM 6.0. We need to create a job group definition for this job group in our https://gitlab.suse.de/qac/qac-openqa-yaml/

I think a new file staging-slem6_0.yaml in https://gitlab.suse.de/qac/qac-openqa-yaml/-/tree/master/sle-micro would fit nicely

Acceptance criteria¶

Move existing job group from OSD to https://gitlab.suse.de/qac/qac-openqa-yaml/-/tree/master/sle-micro
Fix ongoing issues therein or file progress tickets for them

Hide closed

Checklist

Default-qcow-Updates
Default-encrypted-Updates (x86_64 only)
Default-VMware-Updates (x86_64 only)
Base-qcow-Updates
Base-encrypted-Updates (x86_64 only)
Base-VMware-Updates (x86_64 only)
Base-RT-Updates (x86_64 only)

Related issues 2 (2 open — 0 closed)

Actions

Copy link

Updated by ph03nix 11 months ago

Parent task set to #159828

Actions

Copy link

Updated by mdati 11 months ago

Assignee set to mdati

Actions

Copy link

Updated by mdati 11 months ago

Status changed from Workable to In Progress

Actions

Copy link

Updated by mdati 11 months ago · Edited

Created MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1716

MERGED, see new template https://openqa.suse.de/admin/job_templates/566

Last build https://openqa.suse.de/tests/overview?distri=sle-micro&version=6.0&build=X.21.8&groupid=566
all scheduled tests pass.

Actions

Copy link

Updated by mdati 11 months ago

Tags set to slem, yaml
Status changed from In Progress to Feedback

A.C.s ok, no issue to fix atm.

Actions

Copy link

Updated by mdati 11 months ago

Status changed from Feedback to Resolved

Actions

Copy link

Updated by ph03nix 10 months ago

Status changed from Resolved to In Progress
Assignee changed from mdati to ph03nix

Reopening, as the product increments (https://openqa.suse.de/group_overview/572) are still to be done.

Actions

Copy link

Updated by ph03nix 10 months ago

https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1758

Actions

Copy link

Updated by ph03nix 10 months ago

Status changed from In Progress to Resolved
% Done changed from 0 to 100

https://openqa.suse.de/admin/job_templates/572 is now populated and under our control.

Actions

Copy link

#10

Updated by mdati 10 months ago · Edited

Status changed from Resolved to In Progress
Assignee changed from ph03nix to mdati

Poo reopened for discussion in Slack, addressing Product increments SL Micro 6.0, with vmware and encrypted flavors as well.

See https://progress.opensuse.org/issues/159828#note-27

Actions

Copy link

#11

Updated by mdati 10 months ago

Checklist item Default-qcow-Updates added
Checklist item Default-encrypted-Updates (x86_64 only) added
Checklist item Default-VMware-Updates (x86_64 only) added
Checklist item Base-qcow-Updates added
Checklist item Base-encrypted-Updates (x86_64 only) added
Checklist item Base-VMware-Updates (x86_64 only) added
Checklist item Base-RT-Updates (x86_64 only) added

Actions

Copy link

#12

Updated by mdati 10 months ago

Created MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1763,
for all products/flavors in checklist.

Actions

Copy link

#13

Updated by mdati 10 months ago

MR 1763 Merged.

Some logic-errors fixed in new MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1766,
also MERGED.

See tests in last build of https://openqa.suse.de/group_overview/572

Actions

Copy link

#14

Updated by mdati 10 months ago · Edited

Today a new error affected Base-VMware-Updates tests SL Micro 6.0 Product Increments - Containers, failed in boot phase, due to SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk image located in hdd/fixed.

See analysis here below:

Findings on VMware boot error, by matching the autoinst logs of A Vs B casaes:

(A) boot PASS:

https://openqa.suse.de/tests/15036824/logfile?filename=autoinst-log.txt

run_ssh_cmd(if test -e /vmfs/volumes/datastore1/openQA/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk; then while lsof | grep 'cp.*SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk'; do echo File SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk is being copied by other process, sleeping for 60 seconds; sleep 60;done;else cp /vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk /vmfs/volumes/datastore1/openQA/;fi;)] exit-code: 0

and
(B) boot FAIL:

https://openqa.suse.de/tests/15040545/logfile?filename=autoinst-log.txt

run_ssh_cmd(if test -e /vmfs/volumes/Datastore2/openQA/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk; then while lsof | grep 'cp.*SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk'; do echo File SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk is being copied by other process, sleeping for 60 seconds; sleep 60;done;else cp /vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk /vmfs/volumes/Datastore2/openQA/;fi;)] stderr:
  cp: can't stat '/vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk': No such file or directory
... exit-code: 1

The .vmdk file exists in ./hdd/fixed, for both A and B case,
but

for B, SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk not found in VMWARE_DATASTORE by the script, else triggered cp from ./hdd/, but no file here and fails;
for A instead, that script seems to find already .vmdk in VMWARE_DATASTORE, probably for previous works, so no (wrong) cp executed and errror skipped.

Now, the previous cleanup should have to remove image .vmdk for A, but expression doesn't match it, expecting some more string like "openQA-SUT" appended, therefore basename SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk file not removed from VMWARE_DATASTORE. This cleanup runs before above if.

Cleanup for B also ran, but some files resulted locked, so did not delete those. This could be the logic behind above A .vmdk already present.

Proposed solution, by priority:

in _copy_image_vmware, either cp should fallback to hdd/fixed when no file found in hdd OR $file_basename should contain the path including hdd/fixed.
Cleanup should remove also file_basename, if none using it:

   ...
   rm -f ${vmware_openqa_datastore}*${name}* \
      ${vmware_openqa_datastore}*${file_basename}

Clarify/normalize images management in test code, always for both hdd and hdd/fixed.

Actions

Copy link

#15

Updated by mdati 10 months ago

For item n.1 above, created PR2524, last Aug 4.

Actions

Copy link

#16

Updated by mdati 10 months ago

Activities temporary paused will be resumed soon.

Actions

Copy link

#17

Updated by mdati 10 months ago · Edited

Activity restarted:

noted in the os_autoinst repo code flow, the add_disk($self, $args) routine, calling the _copy_image_to_vm_host($args,...), provides in the @args also original image full-path-file (eventually placed in subfolder like fixed/), coming from bootloader_svirt.pm call (see my $hddpath); but then that full-path is never used and even lost.

In fact, when internally calling _copy_image_vmware(...,$file_basename,...), the only basename is extracted and passed as parameter, but there the original image path is hard-coded and partially re-calculated, without any subfolder management.
Therefore images in hdd/ (or iso/) subolders are not correctly managed in copy commands.

In last PR 2524 update, replaced in _copy_image_to_vm_host() and inner-called _copy_image_vmware() the file_basename input parameter with the full path source file, passed by add_disk $args, coming (only) from bootloader_svirt.pm settings at runtime (or similar bootloader_zkvm.pm): this way we allow management of images also in subfolders like fixed/, avoiding the unneeded original folder recalculation.

Actions

Copy link

#18

Updated by mdati 9 months ago · Edited

All tests in SL Micro 6.0 Product Increments - Containers, and other groups too, are actually all failing, affected by a IBS repo renaming issue, causing install_updates to fail.

Poo opened: https://progress.opensuse.org/issues/165536, but issue managed in the named Jira ticket.

Moreover in last builds, a not-yet-clear behavior in such tests let the bootloader_svirt.pm step pass ok, even being the original image placed in the not-managed hdd/fixed/ subdirectory, that caused the error discussed in https://progress.opensuse.org/issues/162941#note-14. I.e. it could it be the image is also present into the destination folder already.

Actions

Copy link

#19

Updated by mdati 9 months ago · Edited

Recently resolved issue about IBS repo renaming, the tests in group 572 pass almost all, but 2 vmware tests still fail in podman netawark/skopeo/remote: for those issues I created poo https://progress.opensuse.org/issues/165884.

About hdd/fixed/ issue in os-autoinst, PR 2524 has been updated, all code fix reverted and simply introduced on-demand debugging in nfs datastore script,adding VMWARE_NFS_DATASTORE_DEBUG=1, to verify the image file status.

Actions

Copy link

#20

Updated by mdati 9 months ago · Edited

Checklist item Default-qcow-Updates set to Done
Checklist item Default-encrypted-Updates (x86_64 only) set to Done
Checklist item Base-qcow-Updates set to Done
Checklist item Base-encrypted-Updates (x86_64 only) set to Done
Checklist item Base-RT-Updates (x86_64 only) set to Done
Tags changed from slem, yaml to slem, yaml, vmware

Status today about SL Micro 6.0 Product Increments - Containers: all tests pass, but only flavors VMware tests fail on rerun.

Main issue resulted a form of slowness or key-press lost, blocking the screen until needle timeout occurred: see poo 165923.
But those vmware tests always have assigned qesapworker# instances in Prg, despite available also other hosts sapworker# in Nue.
So I executed a run forcing the worker on Nue, WORKER_CLASS="sapworker1,svirt-vmware70": https://openqa.suse.de/tests/15299487.
The test proceeded until end, failing for needle format differences. But after needle updated, next [rerun] failed for worker problems and now all reruns on that worker fail this way:
https://openqa.suse.de/tests/15305429/logfile?filename=autoinst-log.txt#line-616

...
!!!! X64 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000 !!!!
RIP  - 0000000000000040, CS  - 0000000000000018, RFLAGS - 0000000000010247
RAX  - 000000005FC0E020, RCX - 000000005FC0E020, RDX - 000000005FC10EC8
RBX  - 000000005FC10EC8, RSP - 000000005FFBD7D8, RBP - 000000005FFBD830
RSI  - 000000005EB71120, RDI - 0000000000000031
R8   - 0000000000000004, R9  - 0000000000000001, R10 - 0000000000000000
R11  - 000000005EBF4140, R12 - 000000005FC10EC8, R13 - 000000005FD1DD98
R14  - 000000005FD8B818, R15 - 000000005EB71130
DS   - 0000000000000008, ES  - 0000000000000008, FS  - 0000000000000008
GS   - 0000000000000008, SS  - 0000000000000008
CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000005FF98000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 00000000FFFFFCC0 000000000000002F, LDTR - 0000000000000000
IDTR - 000000005FEE6440 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 000000005FFBD430
!!!! Can't find image information. !!!!
...

See discussion in https://suse.slack.com/archives/C02CANHLANP/p1725012429713349 and problem seems inside the common host unreal7.qe.nue2.suse.org.

As summary:
VMmware tests, when running on workers:
qesapworker-prg# seem affected by random slow motions or missed key action: poo#165923;
sapworker# since today are affected by a cpu issue in unreal7.qe.nue2.suse.org.

Actions

Copy link

#21

Updated by mdati 9 months ago

Related to action #165923: [qa-tools][vmware][spikesolution][timeboxed:20h] VNC reconnect after reboot size:S added

Actions

Copy link

#22

Updated by mdati 9 months ago

% Done changed from 100 to 80

Actions

Copy link

#23

Updated by mdati 9 months ago · Edited

Today all VMware tests in grp/572 PASS.

In particular, using WORKER_CLASS unreal7 all pass ok; see slack.

But still the tests having WORKER qesapworker assigned, fail because of issues on the used server esxi7: a poo ticket for this issue has been opened by eng.team: https://progress.opensuse.org/issues/166529.

Suggested, as W.A. until fixed, to run VMware tests on workers using unreal7:
WORKER_CLASS=sapworker1,svirt-vmware70 or WORKER_CLASS=unreal7,svirt-vmware70 or WORKER_CLASS=unreal7.
I.e. https://openqa.suse.de/tests/15390100 pass.

Actions

Copy link

#24

Updated by mdati 9 months ago · Edited

At the moment all VMware tests in https://openqa.suse.de/group_overview/572 pass ok.

Please note that in actual VMware tests run the issue in note-14/B is no more present, because the image file results already present in the expected folder (here transferred by some unknown or manual operation), as also revealed cloning the test with VMWARE_NFS_DATASTORE_DEBUG=1, from PR 2524. See bash snippet in i.e. job 15400838

But this could mean that the existing local image is always used, because never cleaned , so that eventual new image update from builds are never tested.

A possible correction could be, in sequential changes:

ensure that the right full-path image is provided as origin in _copy_image_vmware,
as proposed in PR https://github.com/os-autoinst/os-autoinst/pull/2542
Define a lock-file policy for these VMware images (in place of the lsof check), to prevent clean up when running test are using it.
cleanup in item n.2 of note-14 implemented.

Actions

Copy link

#25

Updated by mdati 9 months ago · Edited

Checklist item Default-VMware-Updates (x86_64 only) set to Done
Checklist item Base-VMware-Updates (x86_64 only) set to Done
Status changed from In Progress to Feedback

Confirming the status that at the moment all VMware tests in https://openqa.suse.de/group_overview/572 pass ok, also resolved the problems in poo https://progress.opensuse.org/issues/165884, the requests in note-10 result addressed and completed .

Only remains open the topic in note-24, addressed in the proposed 3 points, but being it a pre-existing situation, it can be subject of a dedicated new ticket, that will be created soon.

Actions

Copy link

#26

Updated by ph03nix 9 months ago

Related to action #166748: [MinimalVM] VMware images not handling hdd subfoldes added

Actions

Copy link

#27

Updated by mdati 8 months ago

Status changed from Feedback to Resolved

Actions

Copy link

#28

Updated by mdati 8 months ago

% Done changed from 80 to 100

Actions

Copy link

#29

Updated by mdati 8 months ago

Unschedule VMware for SLEM product increments, MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1842 MERGED

Actions

Copy link

#30

Updated by ph03nix 7 months ago

Tags changed from slem, yaml, vmware to containers

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Tests (public) » Containers and images

Tags

Custom queries

action #162941

Add job group definitions for SLEM 6.0 to QAC-yaml

Acceptance criteria¶

Updated by ph03nix 11 months ago

Updated by mdati 11 months ago

Updated by mdati 11 months ago

Updated by mdati 11 months ago · Edited

Updated by mdati 11 months ago

Updated by mdati 11 months ago

Updated by ph03nix 10 months ago

Updated by ph03nix 10 months ago

Updated by ph03nix 10 months ago

Updated by mdati 10 months ago · Edited

Updated by mdati 10 months ago

Updated by mdati 10 months ago

Updated by mdati 10 months ago

Updated by mdati 10 months ago · Edited

Updated by mdati 10 months ago

Updated by mdati 10 months ago

Updated by mdati 10 months ago · Edited

Updated by mdati 9 months ago · Edited

Updated by mdati 9 months ago · Edited

Updated by mdati 9 months ago · Edited

Updated by mdati 9 months ago

Updated by mdati 9 months ago

Updated by mdati 9 months ago · Edited

Updated by mdati 9 months ago · Edited

Updated by mdati 9 months ago · Edited

Updated by ph03nix 9 months ago

Updated by mdati 8 months ago

Updated by mdati 8 months ago

Updated by mdati 8 months ago

Updated by ph03nix 7 months ago