Project

General

Profile

Actions

action #105079

closed

coordination #105073: [Epic] Improve logging in openQA

[Research][timebox: 24] Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM.

Added by JRivrain about 2 years ago. Updated about 2 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
Start date:
2022-01-19
Due date:
% Done:

0%

Estimated time:

Description

Even by collecting as many logs as we can, we often face the lack of something.
In the world of support, some companies take a radical approach to this. For example, Solaris used to have a tool that would snapshot the entire OS, that could be uploaded to support, so the support team could navigate through a tarball containing literally the entire OS. It takes some storage space, but can spare a huge amount of time:

  • going back and forth between customer and support for requesting specific logs,
  • rebuilding a system with the same characteristics etc... In some cases, we already export a qcow image at the end of installation, which can be used to reproduce bugs. But this does not represent the system at the time of failure. Qemu has snapshot capabilities that could make it possible to boot the system right after its failure. This could complement or partially replace the current log collection mechanisms.

Currently we have MAKETESTSNAPSHOTS Save snapshot for each test module in qcow image and PUBLISH_HDD_N.

AC1: Test manually how to use those qemu qcow2 that have multiple snapshots
AC2: Communicate with tools team on the possibility of having published a qcow2 just adding some openQA setting to be able to rerun the job and publish on failure.
AC3: Make a proposal of implementation and create follow-up ticket.


Related issues 1 (0 open1 closed)

Related to qe-yam - coordination #105082: Evaluate more simple and consistent ways to collect logsRejected2022-01-19

Actions
Actions #1

Updated by JRivrain about 2 years ago

Actions #2

Updated by JRivrain about 2 years ago

  • Tags set to qe-yast-refinement
Actions #3

Updated by JERiveraMoya about 2 years ago

  • Target version set to Current
Actions #4

Updated by JERiveraMoya about 2 years ago

  • Tags deleted (qe-yast-refinement)
  • Tracker changed from coordination to action
  • Subject changed from Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM. to [Research][timebox: 24] Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM.
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by okurz about 2 years ago

There is a variable FORCE_PUBLISH_HDD_$i , see https://github.com/os-autoinst/os-autoinst/blob/d466a0ee2b2b12f0a3abb60013eefe756ce67fa1/bmwqemu.pm#L141 , that can be set to force publishing an hdd image even if the job fails. This is exactly meant for the purpose of investigating. In theory one could set this variable for a complete openQA instance. We don't do that for our production systems because the overhead would be massive and thousands of failing jobs would upload images needlessly that are never looked at. On demand this can be set of course.

Actions #6

Updated by JERiveraMoya about 2 years ago

  • Status changed from Workable to Rejected
Actions #7

Updated by okurz about 2 years ago

  • Status changed from Rejected to Workable

You might be confusing things here. #90347 is about intermediate snapshots which are recorded while a test is running. The ticket here is about uploading the entire system instead. So "snapshot" is maybe ambiguous in that context. Maybe better call it "complete system image" :)

Actions #8

Updated by JERiveraMoya about 2 years ago

  • Status changed from Workable to Rejected

We thought all was the same thing initially, but that ticket I just pasted to clarify that difference between overlays and snapshots supported by qcow2.
With the setting you pointed to we cover the main thing and this ticket is not needed anymore. We can reject it.

Actions

Also available in: Atom PDF