Project

General

Profile

Actions

action #177462

open

[MinimalVM] VMware backend is unstable

Added by ph03nix 3 months ago. Updated 1 day ago.

Status:
In Progress
Priority:
High
Assignee:
Target version:
-
Start date:
2025-02-18
Due date:
% Done:

0%

Estimated time:

Description

Current MinimalVM test runs in Latest (See https://openqa.suse.de/group_overview/131 and https://openqa.suse.de/group_overview/580) show a variety of ongoing failures, e.g. https://openqa.suse.de/tests/17342686

# Test died: Image decompress in datastore failed!

Not only this failure, but this one is one failure in a greater "series of unfortunate events". The VMware backend appears to be unstable and requires some work.

The task is difficult and it is not clear what we can do, if even so. This ticket is about investigating common issues (see past MinimalVM jobs) and see if we can find a common root cause and provide a fix for it. Or just some clues what could be improved.

Acceptance criteria

I leave this intentionally open, as it's unclear how far we come and what can be done. Ideally fix VMware, but that's a long stretch.

Actions #1

Updated by ph03nix 3 months ago

  • Priority changed from Normal to High
Actions #2

Updated by slo-gin about 2 months ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #3

Updated by ph03nix about 1 month ago

  • Project changed from openQA Tests (public) to Containers and images
  • Category deleted (Bugs in existing tests)
Actions #4

Updated by mdati 28 days ago

The above link of the failed test is expired: no idea what are the errors and the events here reported.

Actions #5

Updated by ph03nix 22 days ago

  • Description updated (diff)
Actions #6

Updated by ph03nix 22 days ago

mdati wrote in #note-4:

The above link of the failed test is expired: no idea what are the errors and the events here reported.

This ticket is not about a particular failure but a series of failures that can still be found in the Latest job groups. While I do update an example link, I say it again: This is not about ONE single failure, this is about improving the overall stability of the backend so that a whole range of failures doesn't happen.

Actions #7

Updated by mdati 2 days ago

  • Status changed from Workable to In Progress
  • Assignee set to mdati
Actions #8

Updated by mloviska 1 day ago

We need to enforce that the Compatibility settings are not ESXi 5.0 virtual machine but something ESXi 7x like. I have noticed that a VM was not booting until the compatibility was upgraded manually in ESXi webUI

Actions #9

Updated by ph03nix 1 day ago

I suggest to timebox the initial assessment period to at most 2 weeks.

Actions #10

Updated by mloviska 1 day ago

Please note that another problem is that we are using 2 drives aka Datastores to keep openQA assets. However the problem is that not both datastores are populated in parallel. Given the fact that a worker uses only 1 particular datastore, the worker might not find the image.

Actions #11

Updated by ph03nix 1 day ago

Actions

Also available in: Atom PDF