action #10884

Save image on filesystem kernel bugs

Added by okurz about 4 years ago. Updated 3 months ago.

Status:ResolvedStart date:23/02/2016
Priority:NormalDue date:
Assignee:okurz% Done:

0%

Category:Feature requests
Target version:QA - future
Difficulty:
Duration:

Description

User story

As a kernel filesystem developer I want to have access to qcow2 images on filesystem kernel bugs to debug what actually happened on the filesystem without subsequent test modules tainting the filesystem

acceptance criteria

  • AC1: If a filesystem related kernel bug is happening on qemu backend the test run is aborted and the qcow2 image is stored
  • AC2: A cleanup strategy for the images exists

tasks

  • Research current post_failure_hooks in os-autoinst-distri-opensuse
  • Grep for kernel bug notice, e.g. a hook into "wait_serial", like "infinite parallel wait serial"
  • If bug is in "filesystem", immediately abort test run and tag image for saving
  • Adapt openQA gru cleanup task

further details

Initial idea was mentioned in https://bugzilla.suse.com/show_bug.cgi?id=963567#c7

History

#1 Updated by okurz over 3 years ago

  • Category set to Enhancement to existing tests

#2 Updated by okurz over 2 years ago

  • Subject changed from save image on filesystem kernel bugs to [sle][functional][kernel][opensuse]save image on filesystem kernel bugs

#3 Updated by sebchlad over 2 years ago

Adding the target version "future" as I would like to better plan it with the SLE Functional team.

#4 Updated by sebchlad over 2 years ago

  • Target version set to future

#5 Updated by okurz over 1 year ago

  • Target version changed from future to future

#6 Updated by okurz over 1 year ago

  • Subject changed from [sle][functional][kernel][opensuse]save image on filesystem kernel bugs to [kernel] save image on filesystem kernel bugs

By now I think this is something for QSK?

#7 Updated by jlausuch 4 months ago

  • Subject changed from [kernel] save image on filesystem kernel bugs to Save image on filesystem kernel bugs

Does it make sense to start publishing qcow2 images of failed tests in filesystems? There are plenty and we could run into storage resource problems...

I guess proper logging is more important than this. Anyone can actually re-run the tests and stop it when necessary to troubleshoot manually.

In any case, if the test fails because of a kernel crash or anything, PUBLISH_HDD won't work, so we would need additional work for these cases in post_fail_hook or something.
So, I remove the tag [kernel] there, because this feature is more generic than just kernel or filesystems.

#8 Updated by szarate 3 months ago

  • Status changed from New to Rejected

I think we can safely reject this.

#9 Updated by okurz 3 months ago

  • Project changed from openQA Tests to openQA Project
  • Category changed from Enhancement to existing tests to Feature requests
  • Status changed from Rejected to New

why would you want to reject it? Do you think it's done or that the original use case wasn't relevant in the first place? If it was relevant then then it should still be relevant as of today. Might be ok to track it within a broader scope though as I understand that both of your teams are not interested. I recently merged https://github.com/os-autoinst/os-autoinst/pull/897 adding the variable "FORCE_PUBLISH_HDD_" which could probably even be set during the test run.

@jlausuch maybe you are willing to pick it back now? :)

#10 Updated by jlausuch 3 months ago

@jlausuch maybe you are willing to pick it back now? :)

Pick back what exactly?

#11 Updated by okurz 3 months ago

back into the backlog of "[kernel]" in response of #10884#note-7

#12 Updated by jlausuch 3 months ago

No, we won't put this into our backlog unless we see a real need for this. Besides, you already implemented the FORCE_PUBLISH_HDD_ parameter so it's available if anyone wants to use it...
I doubt kernel folks will want to publish qcow2 for every failed tests we have :)

#13 Updated by szarate 3 months ago

  • Status changed from New to Rejected

okurz wrote:

why would you want to reject it? Do you think it's done or that the original use case wasn't relevant in the first place? If it was relevant then then it should still be relevant as of today. Might be ok to track it within a broader scope though as I understand that both of your teams are not interested. I recently merged https://github.com/os-autoinst/os-autoinst/pull/897 adding the variable "FORCE_PUBLISH_HDD_" which could probably even be set during the test run.


@jlausuch maybe you are willing to pick it back now? :)

The use case it's not relevant. QSFU doesn't work with filesystems, and afaict since this wasn't picked by anybody in 3 years, it's very likely that there is no interest.

#14 Updated by okurz 3 months ago

  • Status changed from Rejected to Resolved
  • Assignee set to okurz

Fine, but let's not give the impression nothing has been done. So let's call it "done" as described in http://open.qa/docs/#_asset_handling : One just needs to set "FORCE_PUBLISH_HDD_1=…" and will receive an image if possible. This can be done case by case on individual jobs or also within test code, e.g. after detecting a bug situation.

Also available in: Atom PDF