Project

General

Profile

Actions

action #44468

open

[qe-core][functional][tools] Proper handling of assets for svirt workers

Added by SLindoMansilla over 5 years ago. Updated 5 months ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Category:
Infrastructure
Target version:
SUSE QA - Milestone 30
Start date:
2018-11-28
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation / Observation

Trying to verify a SR to fix the systemd-testsuite, I was blocked because OSD workers and QSF shared workers were malfunctioning.

The problem with QSF shared workers is that the needed qcow2 image was missing:

Because

  • openqa.suse.de:/var/lib/openqa/share/factory

was NFS-mounted into

  • s390p8.suse.de:/var/lib/openqa/share/factory

And the image were already cleaned up from openqa.suse.de:/var/lib/openqa/share/factory.

QSF shared-workers (shared-workers.qa.suse.de) have CACHEDIRECTORY configured, but this directory is ignored by the svirt backend.
Only some changes in the test code are necessary to take CACHEDIRECTORY into account, but some configuration is needed at infrastructure level.

  • shared-workers.qa.suse.de:/var/lib/openqa/cache

needs to be made available from

  • s390p8.suse.de:/var/lib/openqa/cache

Acceptance criteria

  • AC: SUT's host machine on svirt backends (ie. s390p8.suse.de), which have a jump host (ie. shared-workers.qa.suse.de) with configured CACHEDIRECTORY, have available assets from that CACHEDIRECTORY.

Suggestions

  • Look at what has already been done in that area, e.g. #44468#note-30.

Files

photo_2018-11-28_18-36-03.jpg (74 KB) photo_2018-11-28_18-36-03.jpg Draft about infrastructure SLindoMansilla, 2018-11-28 17:37

Related issues 2 (0 open2 closed)

Related to openQA Project - action #32281: Can't locate images in Xen jobsResolvedmkittler2018-02-26

Actions
Related to openQA Tests - action #36754: [qe-core][functional][systemd][medium] test fails in systemd_testsuite - needs further investigationResolvedSLindoMansilla2018-06-04

Actions
Actions #1

Updated by okurz over 5 years ago

  • Related to action #32281: Can't locate images in Xen jobs added
Actions #2

Updated by SLindoMansilla over 5 years ago

  • Related to action #36754: [qe-core][functional][systemd][medium] test fails in systemd_testsuite - needs further investigation added
Actions #3

Updated by SLindoMansilla over 5 years ago

  • Related to deleted (action #36754: [qe-core][functional][systemd][medium] test fails in systemd_testsuite - needs further investigation)
Actions #4

Updated by SLindoMansilla over 5 years ago

  • Blocks action #36754: [qe-core][functional][systemd][medium] test fails in systemd_testsuite - needs further investigation added
Actions #5

Updated by SLindoMansilla over 5 years ago

In the attached draft can be seen the current status (black, blue, green) and the expected status (red).

Actions #6

Updated by SLindoMansilla over 5 years ago

  • Status changed from New to In Progress
  • Assignee set to SLindoMansilla

As others are also affected: http://openqa-apac1.suse.de/tests/2417#step/bootloader_zkvm/6

this ticket is gaining priority:

Actions #7

Updated by SLindoMansilla over 5 years ago

Make a separate PR for the mandatory fix: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6320
Merged.

Waiting for mgriessmeier's and nsinger's feedback.

Actions #8

Updated by SLindoMansilla over 5 years ago

  • Status changed from In Progress to New

What was done

The already merged fix was to look for assets in the right machine. Before, the test was looking for assets on the worker machine. On OSD it was working because the assets were NFS mounted from the webui across the worker and the SUT's host.

On shared-workers.qa.suse.de we don't have the NFS share mounted, which caused the test to not find the assets.

This is fixed now in this PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6191

What is still missing?

The worker cache is not propagated to the svirt remote machine, so the assets are missing there.

We need to discuss how to resolve this problem.

My proposal is to NFS-mount the cache directory from the worker into the SUT's host.
After that, some changes in the test code are needed: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6253

Actions #9

Updated by SLindoMansilla over 5 years ago

  • Target version set to Milestone 22
Actions #10

Updated by okurz about 5 years ago

  • Target version changed from Milestone 22 to Milestone 24
Actions #11

Updated by okurz about 5 years ago

  • Assignee deleted (SLindoMansilla)
  • Target version changed from Milestone 24 to Milestone 25

Let's delay the work on this one a bit further.

Actions #12

Updated by SLindoMansilla almost 5 years ago

  • Priority changed from Normal to High

For next grooming

Actions #13

Updated by SLindoMansilla almost 5 years ago

  • Status changed from New to Workable
  • Assignee set to mgriessmeier

As spoken in the grooming meeting, we want to NFS mount the directory
from shared-workers.qa.suse.de:/var/lib/openqa/cache
into [zVM|zKVM|sKVM|XEN]:/var/lib/openqa/cache

Actions #14

Updated by mgriessmeier almost 5 years ago

  • Status changed from Workable to In Progress
Actions #15

Updated by mgriessmeier almost 5 years ago

  • Target version changed from Milestone 25 to Milestone 26
Actions #16

Updated by mgriessmeier over 4 years ago

  • Status changed from In Progress to Feedback
  • Target version changed from Milestone 26 to Milestone 27

SLindoMansilla wrote:

As spoken in the grooming meeting, we want to NFS mount the directory
from shared-workers.qa.suse.de:/var/lib/openqa/cache
into [zVM|zKVM|sKVM|XEN]:/var/lib/openqa/cache

mounted on s390p8 - please check if it's working as expected, then I will do the rest

Actions #17

Updated by SLindoMansilla over 4 years ago

Who should check?
I think we forgot about this ticket.

PR closed and opened draft PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8378

Actions #18

Updated by mgriessmeier over 4 years ago

  • Target version changed from Milestone 27 to Milestone 28

let's rediscuss in grooming

Actions #19

Updated by mgriessmeier over 4 years ago

  • Target version changed from Milestone 28 to Milestone 30

needs to be discussed offline

Actions #20

Updated by SLindoMansilla almost 4 years ago

  • Status changed from Feedback to Workable
  • Assignee changed from mgriessmeier to SLindoMansilla

Tasks

  • Look what was done since 8 months regarding NFS on shared-workers
  • Refine the ticket in a grooming meeting if more work has to be done
Actions #21

Updated by szarate almost 4 years ago

Sergio, do you really plan to work on this ticket?

Also: xen and hyperv are also part of the svirt thing, i guess all this could be unified, as the directories where things are copied, is the same.

Actions #22

Updated by SLindoMansilla almost 4 years ago

szarate wrote:

Sergio, do you really plan to work on this ticket?

Also: xen and hyperv are also part of the svirt thing, i guess all this could be unified, as the directories where things are copied, is the same.

Yes, I "plan" (have the intention) to look into all tickets I have assigned, but I cannot promise any date...
So, if you feel like doing it, please take over. If not, be sure that "some day" I will continue working on this.

Actions #23

Updated by szarate almost 4 years ago

  • Assignee deleted (SLindoMansilla)
  • Priority changed from High to Normal
Actions #24

Updated by szarate almost 4 years ago

  • Subject changed from [functional][u][labs] Proper handling of assets for svirt workers to [functional][u][tools] Proper handling of assets for svirt workers
Actions #25

Updated by tjyrinki_suse over 3 years ago

  • Subject changed from [functional][u][tools] Proper handling of assets for svirt workers to [qe-core][functional][tools] Proper handling of assets for svirt workers
Actions #26

Updated by SLindoMansilla about 3 years ago

  • Blocks deleted (action #36754: [qe-core][functional][systemd][medium] test fails in systemd_testsuite - needs further investigation)
Actions #27

Updated by SLindoMansilla about 3 years ago

  • Related to action #36754: [qe-core][functional][systemd][medium] test fails in systemd_testsuite - needs further investigation added
Actions #28

Updated by okurz over 2 years ago

I came to this ticket due to periodically reviewing tickets as described on https://progress.opensuse.org/projects/openqatests/wiki#How-we-work-on-tickets

This ticket was set to "Normal" priority but was not updated within the SLO period for "Normal" tickets (365 days) as described on https://progress.opensuse.org/projects/openqatests/wiki/Wiki#SLOs-service-level-objectives

First reminder: Please consider picking up this ticket within the next 365 days or just set the ticket to the next lower priority of "Low" (no SLO related time period).

Actions #29

Updated by slo-gin over 1 year ago

This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #30

Updated by mkittler 5 months ago

  • Description updated (diff)

I tried to implement this via https://github.com/os-autoinst/os-autoinst/pull/2381. So you could enable use of the CACHEDIRECTORY by setting SVIRT_WORKER_CACHE=1. However, be aware of the caveat documented by https://github.com/os-autoinst/os-autoinst/pull/2391/files and that this can lead to other problems (e.g. #138746). That's also why this setting has been disabled by default again via https://github.com/os-autoinst/os-autoinst/pull/2401. Also note that this change doesn't cover VMWare yet (but it seems like this ticket is focusing on s390x where it generally works besides the mentioned caveats).

Actions

Also available in: Atom PDF