action #162941
closedAdd job group definitions for SLEM 6.0 to QAC-yaml
Added by ph03nix 7 months ago. Updated 3 months ago.
100%
Description
https://openqa.suse.de/group_overview/566 is a prototype of the upcoming maintenance setup for SLEM 6.0. We need to create a job group definition for this job group in our https://gitlab.suse.de/qac/qac-openqa-yaml/
I think a new file staging-slem6_0.yaml
in https://gitlab.suse.de/qac/qac-openqa-yaml/-/tree/master/sle-micro would fit nicely
Acceptance criteria¶
- Move existing job group from OSD to https://gitlab.suse.de/qac/qac-openqa-yaml/-/tree/master/sle-micro
- Fix ongoing issues therein or file progress tickets for them
Checklist
- Default-qcow-Updates
- Default-encrypted-Updates (x86_64 only)
- Default-VMware-Updates (x86_64 only)
- Base-qcow-Updates
- Base-encrypted-Updates (x86_64 only)
- Base-VMware-Updates (x86_64 only)
- Base-RT-Updates (x86_64 only)
Updated by mdati 7 months ago · Edited
Created MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1716
MERGED, see new template https://openqa.suse.de/admin/job_templates/566
Last build https://openqa.suse.de/tests/overview?distri=sle-micro&version=6.0&build=X.21.8&groupid=566
all scheduled tests pass.
Updated by ph03nix 6 months ago
- Status changed from Resolved to In Progress
- Assignee changed from mdati to ph03nix
Reopening, as the product increments (https://openqa.suse.de/group_overview/572) are still to be done.
Updated by ph03nix 6 months ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
https://openqa.suse.de/admin/job_templates/572 is now populated and under our control.
Updated by mdati 6 months ago · Edited
- Status changed from Resolved to In Progress
- Assignee changed from ph03nix to mdati
Poo reopened for discussion in Slack, addressing Product increments SL Micro 6.0, with vmware
and encrypted
flavors as well.
Updated by mdati 6 months ago
- Checklist item Default-qcow-Updates added
- Checklist item Default-encrypted-Updates (x86_64 only) added
- Checklist item Default-VMware-Updates (x86_64 only) added
- Checklist item Base-qcow-Updates added
- Checklist item Base-encrypted-Updates (x86_64 only) added
- Checklist item Base-VMware-Updates (x86_64 only) added
- Checklist item Base-RT-Updates (x86_64 only) added
Updated by mdati 6 months ago
Created MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1763,
for all products/flavors in checklist.
Updated by mdati 6 months ago
MR 1763 Merged.
Some logic-errors fixed in new MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1766,
also MERGED.
See tests in last build of https://openqa.suse.de/group_overview/572
Updated by mdati 6 months ago · Edited
Today a new error affected Base-VMware-Updates tests SL Micro 6.0 Product Increments - Containers
, failed in boot phase, due to SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk
image located in hdd/fixed
.
See analysis here below:
Findings on VMware boot error, by matching the autoinst
logs of A Vs B casaes:
(A) boot PASS:
https://openqa.suse.de/tests/15036824/logfile?filename=autoinst-log.txt
run_ssh_cmd(if test -e /vmfs/volumes/datastore1/openQA/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk; then while lsof | grep 'cp.*SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk'; do echo File SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk is being copied by other process, sleeping for 60 seconds; sleep 60;done;else cp /vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk /vmfs/volumes/datastore1/openQA/;fi;)] exit-code: 0
and
(B) boot FAIL:
https://openqa.suse.de/tests/15040545/logfile?filename=autoinst-log.txt
run_ssh_cmd(if test -e /vmfs/volumes/Datastore2/openQA/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk; then while lsof | grep 'cp.*SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk'; do echo File SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk is being copied by other process, sleeping for 60 seconds; sleep 60;done;else cp /vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk /vmfs/volumes/Datastore2/openQA/;fi;)] stderr:
cp: can't stat '/vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk': No such file or directory
... exit-code: 1
The .vmdk file exists in ./hdd/fixed
, for both A and B case,
but
for B, SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk not found in VMWARE_DATASTORE by the script, else triggered cp from
./hdd/
, but no file here and fails;for A instead, that script seems to find already .vmdk in VMWARE_DATASTORE, probably for previous works, so no (wrong) cp executed and errror skipped.
Now, the previous cleanup should have to remove image .vmdk for A, but expression doesn't match it, expecting some more string like "openQA-SUT" appended, therefore basename SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk
file not removed from VMWARE_DATASTORE. This cleanup runs before above if.
Cleanup for B also ran, but some files resulted locked, so did not delete those. This could be the logic behind above A .vmdk
already present.
Proposed solution, by priority:
1) in _copy_image_vmware, either cp
should fallback to hdd/fixed
when no file found in hdd
OR $file_basename
should contain the path including hdd/fixed
.
2) Cleanup should remove also file_basename
, if none using it:
...
rm -f ${vmware_openqa_datastore}*${name}* \
${vmware_openqa_datastore}*${file_basename}
3) Clarify/normalize images management in test code, always for both hdd
and hdd/fixed
.
Updated by mdati 5 months ago · Edited
Activity restarted:
noted in the os_autoinst repo code flow, the add_disk($self, $args)
routine, calling the _copy_image_to_vm_host($args,...), provides in the @args
also original image full-path-file (eventually placed in subfolder like fixed/
), coming from bootloader_svirt.pm call (see my $hddpath
); but then that full-path is never used and even lost.
In fact, when internally calling _copy_image_vmware(...,$file_basename,...), the only basename is extracted and passed as parameter, but there the original image path is hard-coded and partially re-calculated, without any subfolder management.
Therefore images in hdd/ (or iso/) subolders are not correctly managed in copy commands.
In last PR 2524 update, replaced in _copy_image_to_vm_host()
and inner-called _copy_image_vmware()
the file_basename input parameter with the full path source file, passed by add_disk
$args, coming (only) from bootloader_svirt.pm settings at runtime (or similar bootloader_zkvm.pm): this way we allow management of images also in subfolders like fixed/
, avoiding the unneeded original folder recalculation.
Updated by mdati 5 months ago · Edited
All tests in SL Micro 6.0 Product Increments - Containers, and other groups too, are actually all failing, affected by a IBS repo renaming issue, causing install_updates
to fail.
Poo opened: https://progress.opensuse.org/issues/165536, but issue managed in the named Jira ticket.
Moreover in last builds, a not-yet-clear behavior in such tests let the bootloader_svirt.pm step pass ok, even being the original image placed in the not-managed hdd/fixed/
subdirectory, that caused the error discussed in https://progress.opensuse.org/issues/162941#note-14. I.e. it could it be the image is also present into the destination folder already.
Updated by mdati 5 months ago · Edited
Recently resolved issue about IBS repo renaming, the tests in group 572 pass almost all, but 2 vmware tests still fail in podman netawark/skopeo/remote: for those issues I created poo https://progress.opensuse.org/issues/165884.
About hdd/fixed/
issue in os-autoinst, PR 2524 has been updated, all code fix reverted and simply introduced on-demand debugging in nfs datastore script,adding VMWARE_NFS_DATASTORE_DEBUG=1, to verify the image file status.
Updated by mdati 5 months ago · Edited
- Checklist item Default-qcow-Updates set to Done
- Checklist item Default-encrypted-Updates (x86_64 only) set to Done
- Checklist item Base-qcow-Updates set to Done
- Checklist item Base-encrypted-Updates (x86_64 only) set to Done
- Checklist item Base-RT-Updates (x86_64 only) set to Done
- Tags changed from slem, yaml to slem, yaml, vmware
Status today about SL Micro 6.0 Product Increments - Containers
: all tests pass, but only flavors VMware
tests fail on rerun.
Main issue resulted a form of slowness or key-press lost, blocking the screen until needle timeout occurred: see poo 165923.
But those vmware tests always have assigned qesapworker#
instances in Prg, despite available also other hosts sapworker#
in Nue.
So I executed a run forcing the worker on Nue, WORKER_CLASS="sapworker1,svirt-vmware70": https://openqa.suse.de/tests/15299487.
The test proceeded until end, failing for needle format differences. But after needle updated, next [rerun] failed for worker problems and now all reruns on that worker fail this way:
https://openqa.suse.de/tests/15305429/logfile?filename=autoinst-log.txt#line-616
...
!!!! X64 Exception Type - 06(#UD - Invalid Opcode) CPU Apic ID - 00000000 !!!!
RIP - 0000000000000040, CS - 0000000000000018, RFLAGS - 0000000000010247
RAX - 000000005FC0E020, RCX - 000000005FC0E020, RDX - 000000005FC10EC8
RBX - 000000005FC10EC8, RSP - 000000005FFBD7D8, RBP - 000000005FFBD830
RSI - 000000005EB71120, RDI - 0000000000000031
R8 - 0000000000000004, R9 - 0000000000000001, R10 - 0000000000000000
R11 - 000000005EBF4140, R12 - 000000005FC10EC8, R13 - 000000005FD1DD98
R14 - 000000005FD8B818, R15 - 000000005EB71130
DS - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008
GS - 0000000000000008, SS - 0000000000000008
CR0 - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000005FF98000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 00000000FFFFFCC0 000000000000002F, LDTR - 0000000000000000
IDTR - 000000005FEE6440 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 000000005FFBD430
!!!! Can't find image information. !!!!
...
See discussion in https://suse.slack.com/archives/C02CANHLANP/p1725012429713349 and problem seems inside the common host unreal7.qe.nue2.suse.org.
As summary:
VMmware tests, when running on workers:
qesapworker-prg#
seem affected by random slow motions or missed key action: poo#165923;
sapworker#
since today are affected by a cpu issue in unreal7.qe.nue2.suse.org.
Updated by mdati 5 months ago
- Related to action #165923: [qa-tools][vmware][spikesolution][timeboxed:20h] VNC reconnect after reboot size:S added
Updated by mdati 4 months ago · Edited
Today all VMware tests in grp/572 PASS.
In particular, using WORKER_CLASS unreal7
all pass ok; see slack.
But still the tests having WORKER qesapworker assigned, fail because of issues on the used server esxi7
: a poo ticket for this issue has been opened by eng.team: https://progress.opensuse.org/issues/166529.
Suggested, as W.A. until fixed, to run VMware tests on workers using unreal7
:
WORKER_CLASS=sapworker1,svirt-vmware70
or WORKER_CLASS=unreal7,svirt-vmware70
or WORKER_CLASS=unreal7
.
I.e. https://openqa.suse.de/tests/15390100 pass.
Updated by mdati 4 months ago · Edited
At the moment all VMware tests in https://openqa.suse.de/group_overview/572 pass ok.
Please note that in actual VMware tests run the issue in note-14/B is no more present, because the image file results already present in the expected folder (here transferred by some unknown or manual operation), as also revealed cloning the test with VMWARE_NFS_DATASTORE_DEBUG=1
, from PR 2524. See bash snippet in i.e. job 15400838
But this could mean that the existing local image is always used, because never cleaned , so that eventual new image update from builds are never tested.
A possible correction could be, in sequential changes:
- ensure that the right full-path image is provided as origin in _copy_image_vmware, as proposed in PR https://github.com/os-autoinst/os-autoinst/pull/2542
- Define a
lock-file policy
for these VMware images (in place of thelsof
check), to prevent clean up when running test are using it. - cleanup in item n.2 of note-14 implemented.
Updated by mdati 4 months ago · Edited
- Checklist item Default-VMware-Updates (x86_64 only) set to Done
- Checklist item Base-VMware-Updates (x86_64 only) set to Done
- Status changed from In Progress to Feedback
Confirming the status that at the moment all VMware tests in https://openqa.suse.de/group_overview/572 pass ok, also resolved the problems in poo https://progress.opensuse.org/issues/165884, the requests in note-10 result addressed and completed .
Only remains open the topic in note-24, addressed in the proposed 3 points, but being it a pre-existing situation, it can be subject of a dedicated new ticket, that will be created soon.
Updated by ph03nix 4 months ago
- Related to action #166748: [MinimalVM] VMware images not handling hdd subfoldes added
Updated by mdati 4 months ago
Unschedule VMware for SLEM product increments, MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1842 MERGED