action #105079: [Research][timebox: 24] Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM. - qe-yam - openSUSE Project Management Tool

Actions

Copy link

action #105079

closed

coordination #105073: [Epic] Improve logging in openQA

[Research][timebox: 24] Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM.

Added by JRivrain about 3 years ago. Updated almost 3 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Target version:

Current

Start date:

2022-01-19

Due date:

% Done:

Estimated time:

Description

Even by collecting as many logs as we can, we often face the lack of something.
In the world of support, some companies take a radical approach to this. For example, Solaris used to have a tool that would snapshot the entire OS, that could be uploaded to support, so the support team could navigate through a tarball containing literally the entire OS. It takes some storage space, but can spare a huge amount of time:

going back and forth between customer and support for requesting specific logs,
rebuilding a system with the same characteristics etc... In some cases, we already export a qcow image at the end of installation, which can be used to reproduce bugs. But this does not represent the system at the time of failure. Qemu has snapshot capabilities that could make it possible to boot the system right after its failure. This could complement or partially replace the current log collection mechanisms.

Currently we have MAKETESTSNAPSHOTS Save snapshot for each test module in qcow image and PUBLISH_HDD_N.

AC1: Test manually how to use those qemu qcow2 that have multiple snapshots
AC2: Communicate with tools team on the possibility of having published a qcow2 just adding some openQA setting to be able to rerun the job and publish on failure.
AC3: Make a proposal of implementation and create follow-up ticket.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by JRivrain about 3 years ago

Related to coordination #105082: Evaluate more simple and consistent ways to collect logs added

Actions

Copy link

Updated by JRivrain about 3 years ago

Tags set to qe-yast-refinement

Actions

Copy link

Updated by JERiveraMoya about 3 years ago

Target version set to Current

Actions

Copy link

Updated by JERiveraMoya about 3 years ago

Tags deleted (~~qe-yast-refinement~~)
Tracker changed from coordination to action
Subject changed from Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM. to [Research][timebox: 24] Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM.
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz about 3 years ago

There is a variable FORCE_PUBLISH_HDD_$i , see https://github.com/os-autoinst/os-autoinst/blob/d466a0ee2b2b12f0a3abb60013eefe756ce67fa1/bmwqemu.pm#L141 , that can be set to force publishing an hdd image even if the job fails. This is exactly meant for the purpose of investigating. In theory one could set this variable for a complete openQA instance. We don't do that for our production systems because the overhead would be massive and thousands of failing jobs would upload images needlessly that are never looked at. On demand this can be set of course.

Actions

Copy link

Updated by JERiveraMoya almost 3 years ago

Status changed from Workable to Rejected

https://progress.opensuse.org/issues/90347

Actions

Copy link

Updated by okurz almost 3 years ago

Status changed from Rejected to Workable

You might be confusing things here. #90347 is about intermediate snapshots which are recorded while a test is running. The ticket here is about uploading the entire system instead. So "snapshot" is maybe ambiguous in that context. Maybe better call it "complete system image" :)

Actions

Copy link

Updated by JERiveraMoya almost 3 years ago

Status changed from Workable to Rejected

We thought all was the same thing initially, but that ticket I just pasted to clarify that difference between overlays and snapshots supported by qcow2.
With the setting you pointed to we cover the main thing and this ticket is not needed anymore. We can reject it.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Tests (public) » qe-yam

Tags

Custom queries

action #105079

[Research][timebox: 24] Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM.

Updated by JRivrain about 3 years ago

Updated by JRivrain about 3 years ago

Updated by JERiveraMoya about 3 years ago

Updated by JERiveraMoya about 3 years ago

Updated by okurz about 3 years ago

Updated by JERiveraMoya almost 3 years ago

Updated by okurz almost 3 years ago

Updated by JERiveraMoya almost 3 years ago