Project

General

Profile

Actions

action #153625

closed

coordination #151822: [epic] Soft-fails mitigation

Revisit soft-failure bsc#1178033

Added by JERiveraMoya 10 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
2023-12-01
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See parent epic.
https://openqa.suse.de/tests/13262382#step/journal_check/21 -> bsc#1178033

Acceptance criteria

AC1: Revisit soft-failure bsc#1178033


Related issues 1 (0 open1 closed)

Blocked by openQA Infrastructure - action #156913: Remove its=off setting in global QEMUMACHINE for worker-arm{1,2} size:MResolveddheidler2024-03-08

Actions
Actions #1

Updated by lmanfredi 9 months ago

  • Status changed from Workable to In Progress
  • Assignee set to lmanfredi
Actions #2

Updated by lmanfredi 9 months ago

The issue kernel: ITS@0x8080000: Unable to locate ITS domain handle on SUSE MicroOS aarch64 is still here for versions Micro 5.1 5.2 5.3 5.4 5.5 on arch aarch64.
See VRs

Actions #3

Updated by lmanfredi 9 months ago

Following the comment inside bsc#1178033 and trying to override the settings QEMUMACHINE with its=on it seems that still holds the old value as its=off.
In settings there is: "QEMUMACHINE"="virt,usb=off,gic-version=3,its=on"
In vars.json there is: "QEMUMACHINE" : "virt,usb=off,gic-version=3,its=off"
See VRs.

Actions #4

Updated by JERiveraMoya 9 months ago · Edited

Faced the same issue a time ago (as you might have seen in the open bug linking to my duplicated bug).

Actions #5

Updated by JERiveraMoya 9 months ago

You could ask Tool squad how to workaround this perhaps...

Actions #6

Updated by JERiveraMoya 9 months ago

JERiveraMoya wrote in #note-5:

You could ask Tool squad how to workaround this perhaps...
https://suse.slack.com/archives/C02CANHLANP/p1698321040120439

Actions #8

Updated by JERiveraMoya 9 months ago

lmanfredi wrote in #note-7:

Asked in slack:

QEMUMACHINE: is it read only or read/write?

It's read/write. See https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc#supported-variables-per-backend . However settings can be added to the variable. See https://github.com/os-autoinst/os-autoinst/blob/02cbbda759742468bd69625ce51eaac80cc59cd5/backend/qemu.pm#L550

did you get an answer how to workaround this? what comes to my mind is that you could temporarily "hack" YOUR openQA instance (not OSD!) hardcoding that script you pointed to reply developer.
or where you able to trigger the job creating a different worker as @lemon suggested?

Actions #9

Updated by lmanfredi 9 months ago

After created and used a new MACHINE aarch64-its-on-1gbram with:

QEMUMACHINE=virt,usb=off,gic-version=host,its=on

there is still the old value in vars.json:

"QEMUMACHINE" : "virt,usb=off,gic-version=3,its=off"

See VRs

Actions #10

Updated by JERiveraMoya 9 months ago

then the only thing that comes to my mind is to hack the variable in the os-autoinst code in your own instance.

Actions #11

Updated by JERiveraMoya 9 months ago

  • Tags changed from qe-yam-feb-sprint to qe-yam-mar-sprint
Actions #12

Updated by lmanfredi 9 months ago

The issue is still present in the last build 20240304-1

Actions #13

Updated by lmanfredi 9 months ago · Edited

See comment: The worker overwrites those settings:

See GitLab repo salt-pillars-openqa

worker-arm1:
    ## FQDN: worker-arm1.oqa.prg2.suse.org
    ##  serial: `ssh -t jumpy@qe-jumpy.prg2.suse.org "ipmitool -I lanplus -H openqaworker-arm1.qe-ipmi-ur -U qadmin -P '@HX81VE_mLG[Y8Y8' sol activate"`
    numofworkers: 40
    bridge_iface: eth0
    webuis:
      openqa.suse.de:
        key: 160AA95F68C410D5
        secret: E38C9451DB07468D
    global:
      WORKER_CLASS: qemu_aarch64,tap,region-prg,location-prg2
      QEMUMACHINE: virt,usb=off,gic-version=3,its=off

  worker-arm2:
    ## FQDN: worker-arm2.oqa.prg2.suse.org
    ##  serial: `ssh -t jumpy@qe-jumpy.prg2.suse.org "ipmitool -I lanplus -H openqaworker-arm2.qe-ipmi-ur -U qadmin -P 'fmsXRLJIVVSiHGzP' sol activate"`
    numofworkers: 40
    bridge_iface: eth0
    webuis:
      openqa.suse.de:
        key: 160AA95F68C410D5
        secret: E38C9451DB07468D
    global:
      WORKER_CLASS: qemu_aarch64,tap,region-prg,location-prg2
      QEMUMACHINE: virt,usb=off,gic-version=3,its=off
Actions #14

Updated by lmanfredi 9 months ago

Tried to use an isos post instead of cloned jobs, it seems that we have the same result as previous.
Inside jars.json is still its=off.
See VRs created by isos post with the job_template that contains:

scenarios:
  aarch64:
    sle-micro-5.1-DVD-Updates-aarch64:
      - test_slem_installation_autoyast_its_on:
          settings:
            AUTOYAST: autoyast_sle15/autoyast_sle-micro_updates.xml.ep
            AUTOYAST_PREPARE_PROFILE: '1'
            DESKTOP: textmode
            ISO: SLE-Micro-%VERSION%-DVD-%ARCH%-GM.iso
            +MACHINE: 'aarch64-its-on-1gbram'
            +QEMUMACHINE: 'virt,usb=off,gic-version=3,its=on'
            SCC_REGISTER: installation
          testsuite: null
Actions #15

Updated by JERiveraMoya 9 months ago

lmanfredi wrote in #note-14:

Tried to use an isos post instead of cloned jobs, it seems that we have the same result as previous.
Inside jars.json is still its=off.
See VRs created by isos post with the job_template that contains:

scenarios:
  aarch64:
    sle-micro-5.1-DVD-Updates-aarch64:
      - test_slem_installation_autoyast_its_on:
          settings:
            AUTOYAST: autoyast_sle15/autoyast_sle-micro_updates.xml.ep
            AUTOYAST_PREPARE_PROFILE: '1'
            DESKTOP: textmode
            ISO: SLE-Micro-%VERSION%-DVD-%ARCH%-GM.iso
            +MACHINE: 'aarch64-its-on-1gbram'
            +QEMUMACHINE: 'virt,usb=off,gic-version=3,its=on'
            SCC_REGISTER: installation
          testsuite: null

Please report back in the previous Slack thread.

Actions #16

Updated by lmanfredi 9 months ago

Opened new ticket poo#156892 to change settings in configuration

Actions #17

Updated by lmanfredi 9 months ago

At the moment we can not remove this soft-failure

Actions #18

Updated by rainerkoenig 9 months ago

  • Status changed from In Progress to New
Actions #19

Updated by lmanfredi 9 months ago

Ticket poo#156892 copied to action poo#156913 by @okurz

Actions #20

Updated by JERiveraMoya 8 months ago

  • Tags deleted (qe-yam-mar-sprint)
Actions #21

Updated by JERiveraMoya 8 months ago

  • Blocked by action #156913: Remove its=off setting in global QEMUMACHINE for worker-arm{1,2} size:M added
Actions #22

Updated by JERiveraMoya 8 months ago

  • Tags set to qe-yam-apr-sprint
Actions #23

Updated by JERiveraMoya 8 months ago

  • Status changed from New to Workable
Actions #24

Updated by lmanfredi 8 months ago

Waiting until related salt-pillars-openqa MR#746 will be merged.
See related poo#156913.

Actions #25

Updated by lmanfredi 8 months ago

Merged related salt-pillars-openqa MR#746

Actions #26

Updated by lmanfredi 8 months ago

  • Status changed from Workable to In Progress
Actions #27

Updated by lmanfredi 8 months ago

Verified in build 20240325-1 passed

Actions #28

Updated by JERiveraMoya 8 months ago

  • Tags changed from qe-yam-apr-sprint to qe-yam-mar-sprint
  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF