action #105079
closedcoordination #105073: [Epic] Improve logging in openQA
[Research][timebox: 24] Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM.
0%
Description
Even by collecting as many logs as we can, we often face the lack of something.
In the world of support, some companies take a radical approach to this. For example, Solaris used to have a tool that would snapshot the entire OS, that could be uploaded to support, so the support team could navigate through a tarball containing literally the entire OS. It takes some storage space, but can spare a huge amount of time:
- going back and forth between customer and support for requesting specific logs,
- rebuilding a system with the same characteristics etc... In some cases, we already export a qcow image at the end of installation, which can be used to reproduce bugs. But this does not represent the system at the time of failure. Qemu has snapshot capabilities that could make it possible to boot the system right after its failure. This could complement or partially replace the current log collection mechanisms.
Currently we have MAKETESTSNAPSHOTS Save snapshot for each test module in qcow image and PUBLISH_HDD_N.
AC1: Test manually how to use those qemu qcow2 that have multiple snapshots
AC2: Communicate with tools team on the possibility of having published a qcow2 just adding some openQA setting to be able to rerun the job and publish on failure.
AC3: Make a proposal of implementation and create follow-up ticket.
Updated by JRivrain almost 3 years ago
- Related to coordination #105082: Evaluate more simple and consistent ways to collect logs added
Updated by JERiveraMoya almost 3 years ago
- Tags deleted (
qe-yast-refinement) - Tracker changed from coordination to action
- Subject changed from Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM. to [Research][timebox: 24] Evaluate a way to snapshot and upload the entire system at the point of failure, that could be bootable as VM.
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz almost 3 years ago
There is a variable FORCE_PUBLISH_HDD_$i , see https://github.com/os-autoinst/os-autoinst/blob/d466a0ee2b2b12f0a3abb60013eefe756ce67fa1/bmwqemu.pm#L141 , that can be set to force publishing an hdd image even if the job fails. This is exactly meant for the purpose of investigating. In theory one could set this variable for a complete openQA instance. We don't do that for our production systems because the overhead would be massive and thousands of failing jobs would upload images needlessly that are never looked at. On demand this can be set of course.
Updated by JERiveraMoya almost 3 years ago
- Status changed from Workable to Rejected
Updated by okurz almost 3 years ago
- Status changed from Rejected to Workable
You might be confusing things here. #90347 is about intermediate snapshots which are recorded while a test is running. The ticket here is about uploading the entire system instead. So "snapshot" is maybe ambiguous in that context. Maybe better call it "complete system image" :)
Updated by JERiveraMoya almost 3 years ago
- Status changed from Workable to Rejected
We thought all was the same thing initially, but that ticket I just pasted to clarify that difference between overlays and snapshots supported by qcow2.
With the setting you pointed to we cover the main thing and this ticket is not needed anymore. We can reject it.