action #162941
openAdd job group definitions for SLEM 6.0 to QAC-yaml
80%
Description
https://openqa.suse.de/group_overview/566 is a prototype of the upcoming maintenance setup for SLEM 6.0. We need to create a job group definition for this job group in our https://gitlab.suse.de/qac/qac-openqa-yaml/
I think a new file staging-slem6_0.yaml
in https://gitlab.suse.de/qac/qac-openqa-yaml/-/tree/master/sle-micro would fit nicely
Acceptance criteria¶
- Move existing job group from OSD to https://gitlab.suse.de/qac/qac-openqa-yaml/-/tree/master/sle-micro
- Fix ongoing issues therein or file progress tickets for them
Checklist
- Default-qcow-Updates
- Default-encrypted-Updates (x86_64 only)
- Default-VMware-Updates (x86_64 only)
- Base-qcow-Updates
- Base-encrypted-Updates (x86_64 only)
- Base-VMware-Updates (x86_64 only)
- Base-RT-Updates (x86_64 only)
Updated by mdati 2 months ago · Edited
Created MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1716
MERGED, see new template https://openqa.suse.de/admin/job_templates/566
Last build https://openqa.suse.de/tests/overview?distri=sle-micro&version=6.0&build=X.21.8&groupid=566
all scheduled tests pass.
Updated by ph03nix about 2 months ago
- Status changed from Resolved to In Progress
- Assignee changed from mdati to ph03nix
Reopening, as the product increments (https://openqa.suse.de/group_overview/572) are still to be done.
Updated by ph03nix about 2 months ago
Updated by ph03nix about 2 months ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
https://openqa.suse.de/admin/job_templates/572 is now populated and under our control.
Updated by mdati about 1 month ago · Edited
- Status changed from Resolved to In Progress
- Assignee changed from ph03nix to mdati
Poo reopened for discussion in Slack, addressing Product increments SL Micro 6.0, with vmware
and encrypted
flavors as well.
Updated by mdati about 1 month ago
- Checklist item Default-qcow-Updates added
- Checklist item Default-encrypted-Updates (x86_64 only) added
- Checklist item Default-VMware-Updates (x86_64 only) added
- Checklist item Base-qcow-Updates added
- Checklist item Base-encrypted-Updates (x86_64 only) added
- Checklist item Base-VMware-Updates (x86_64 only) added
- Checklist item Base-RT-Updates (x86_64 only) added
Updated by mdati about 1 month ago
Created MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1763,
for all products/flavors in checklist.
Updated by mdati about 1 month ago
MR 1763 Merged.
Some logic-errors fixed in new MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1766,
also MERGED.
See tests in last build of https://openqa.suse.de/group_overview/572
Updated by mdati about 1 month ago · Edited
Today a new error affected Base-VMware-Updates tests SL Micro 6.0 Product Increments - Containers
, failed in boot phase, due to SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk
image located in hdd/fixed
.
See analysis here below:
Findings on VMware boot error, by matching the autoinst
logs of A Vs B casaes:
(A) boot PASS:
https://openqa.suse.de/tests/15036824/logfile?filename=autoinst-log.txt
run_ssh_cmd(if test -e /vmfs/volumes/datastore1/openQA/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk; then while lsof | grep 'cp.*SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk'; do echo File SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk is being copied by other process, sleeping for 60 seconds; sleep 60;done;else cp /vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk /vmfs/volumes/datastore1/openQA/;fi;)] exit-code: 0
and
(B) boot FAIL:
https://openqa.suse.de/tests/15040545/logfile?filename=autoinst-log.txt
run_ssh_cmd(if test -e /vmfs/volumes/Datastore2/openQA/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk; then while lsof | grep 'cp.*SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk'; do echo File SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk is being copied by other process, sleeping for 60 seconds; sleep 60;done;else cp /vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk /vmfs/volumes/Datastore2/openQA/;fi;)] stderr:
cp: can't stat '/vmfs/volumes/openqa/hdd/SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk': No such file or directory
... exit-code: 1
The .vmdk file exists in ./hdd/fixed
, for both A and B case,
but
for B, SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk not found in VMWARE_DATASTORE by the script, else triggered cp from
./hdd/
, but no file here and fails;for A instead, that script seems to find already .vmdk in VMWARE_DATASTORE, probably for previous works, so no (wrong) cp executed and errror skipped.
Now, the previous cleanup should have to remove image .vmdk for A, but expression doesn't match it, expecting some more string like "openQA-SUT" appended, therefore basename SL-Micro.x86_64-6.0-Base-VMware-GM.vmdk
file not removed from VMWARE_DATASTORE. This cleanup runs before above if.
Cleanup for B also ran, but some files resulted locked, so did not delete those. This could be the logic behind above A .vmdk
already present.
Proposed solution, by priority:
1) in _copy_image_vmware, cp
should fallback to hdd/fixed
when no file found in hdd
.
2) Cleanup should remove also ${file_basename}:
...
rm -f ${vmware_openqa_datastore}*${name}* \\
${vmware_openqa_datastore}*${file_basename}
3) Clarify/normalize images management in test code, always for both hdd
and hdd/fixed
.
Updated by mdati 19 days ago · Edited
Activity restarted:
noted in the os_autoinst repo code flow, the add_disk($self, $args)
routine, calling the _copy_image_to_vm_host($args,...), provides in the @args
also original image full-path-file (eventually placed in subfolder like fixed/
), coming from bootloader_svirt.pm call (see my $hddpath
); but then that full-path is never used and even lost.
In fact, when internally calling _copy_image_vmware(...,$file_basename,...), the only basename is extracted and passed as parameter, but there the original image path is hard-coded and partially re-calculated, without any subfolder management.
Therefore images in hdd/ (or iso/) subolders are not correctly managed in copy commands.
In last PR 2524 update, replaced in _copy_image_to_vm_host()
and inner-called _copy_image_vmware()
the file_basename input parameter with the full path source file, passed by add_disk
$args, coming (only) from bootloader_svirt.pm settings at runtime (or similar bootloader_zkvm.pm): this way we allow management of images also in subfolders like fixed/
, avoiding the unneeded original folder recalculation.
Updated by mdati 17 days ago · Edited
All tests in SL Micro 6.0 Product Increments - Containers, and other groups too, are actually all failing, affected by a IBS repo renaming issue, causing install_updates
to fail.
Poo opened: https://progress.opensuse.org/issues/165536, but issue managed in the named Jira ticket.
Moreover in last builds, a not-yet-clear behavior in such tests let the bootloader_svirt.pm step pass ok, even being the original image placed in the not-managed hdd/fixed/
subdirectory, that caused the error discussed in https://progress.opensuse.org/issues/162941#note-14. I.e. it could it be the image is also present into the destination folder already.
Updated by mdati 12 days ago · Edited
Recently resolved issue about IBS repo renaming, the tests in group 572 pass almost all, but 2 vmware tests still fail in podman netawark/skopeo/remote: for those issues I created poo https://progress.opensuse.org/issues/165884.
About hdd/fixed/
issue in os-autoinst, PR 2524 has been updated, all code fix reverted and simply introduced on-demand debugging in nfs datastore script, to verify the image file status.
Updated by mdati 9 days ago · Edited
- Checklist item Default-qcow-Updates set to Done
- Checklist item Default-encrypted-Updates (x86_64 only) set to Done
- Checklist item Base-qcow-Updates set to Done
- Checklist item Base-encrypted-Updates (x86_64 only) set to Done
- Checklist item Base-RT-Updates (x86_64 only) set to Done
- Tags changed from slem, yaml to slem, yaml, vmware
Status today about SL Micro 6.0 Product Increments - Containers
: all tests pass, but only flavors VMware
tests fail on rerun.
Main issue resulted a form of slowness or key-press lost, blocking the screen until needle timeout occurred: see poo 165923.
But those vmware tests always have assigned qesapworker#
instances in Prg, despite available also other hosts sapworker#
in Nue.
So I executed a run forcing the worker on Nue, WORKER_CLASS="sapworker1,svirt-vmware70": https://openqa.suse.de/tests/15299487.
The test proceeded until end, failing for needle format differences. But after needle updated, next [rerun] failed for worker problems and now all reruns on that worker fail this way:
https://openqa.suse.de/tests/15305429/logfile?filename=autoinst-log.txt#line-616
...
!!!! X64 Exception Type - 06(#UD - Invalid Opcode) CPU Apic ID - 00000000 !!!!
RIP - 0000000000000040, CS - 0000000000000018, RFLAGS - 0000000000010247
RAX - 000000005FC0E020, RCX - 000000005FC0E020, RDX - 000000005FC10EC8
RBX - 000000005FC10EC8, RSP - 000000005FFBD7D8, RBP - 000000005FFBD830
RSI - 000000005EB71120, RDI - 0000000000000031
R8 - 0000000000000004, R9 - 0000000000000001, R10 - 0000000000000000
R11 - 000000005EBF4140, R12 - 000000005FC10EC8, R13 - 000000005FD1DD98
R14 - 000000005FD8B818, R15 - 000000005EB71130
DS - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008
GS - 0000000000000008, SS - 0000000000000008
CR0 - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000005FF98000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 00000000FFFFFCC0 000000000000002F, LDTR - 0000000000000000
IDTR - 000000005FEE6440 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 000000005FFBD430
!!!! Can't find image information. !!!!
...
See discussion in https://suse.slack.com/archives/C02CANHLANP/p1725012429713349 and problem seems inside the common host unreal7.qe.nue2.suse.org.
As summary:
VMmware tests, when running on workers:
qesapworker-prg#
seem affected by random slow motions or missed key action: poo#165923;
sapworker#
since today are affected by a cpu issue in unreal7.qe.nue2.suse.org.
Updated by mdati 9 days ago
- Related to action #165923: [qa-tools][vmware][spikesolution][timeboxed:20h] VNC reconnect after reboot size:S added