Project

General

Profile

Actions

action #129340

closed

[regression] openqa cannot start jobs with symlinked assets size:M

Added by ph03nix 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-05-15
Due date:
2023-07-07
% Done:

0%

Estimated time:

Description

Observation

When starting a job with an asset which is a symlink, openQA dies with the following error message

qemu-img: Could not open '/var/lib/openqa/pool/4/ignition.qcow2': Failed to lock byte 201

ph03nix wrote:

cdywan wrote:

Can you clarify e.g. is this a recent regression? Maybe if you have a job that used to work? If it wasn't already cleaned up in the meanwhile, adding this ticket would stop it from being removed.

This is a recent regression. I have a automated script that creates symlinks to assets, which used to work until recently it stopped working with exactly this issue.

I only have failures on my own openQA instance, but it's trivial to reproduce the issue (use a symlinked asset)

Steps to reproduce

  • Create asset, which is a symlink
  • Start a job with this asset as HDD_1 variable (or HDD_2, ...)

Impact

  • Regression, this means we can not have symlinks as asset files

Problem

  • No hypothesis

Suggestion

  • Investigate what the actual problem is
  • Try to reproduce the problem with os-autoinst only. If not reproducible check if this needs an openQA worker with or without enabled worker cache

Workarounds

  • Use hard-links or copies of asset files
Actions #1

Updated by okurz 11 months ago

  • Description updated (diff)
  • Category set to Feature requests
  • Target version set to future

As explained in chat there are workarounds to consider. I included them now in the description.

Actions #2

Updated by ph03nix 10 months ago

I'd argue that this is not a feature request but rather a bug because this was working before.

Actions #3

Updated by livdywan 10 months ago

ph03nix wrote:

I'd argue that this is not a feature request but rather a bug because this was working before.

Can you clarify e.g. is this a recent regression? Maybe if you have a job that used to work? If it wasn't already cleaned up in the meanwhile, adding this ticket would stop it from being removed.

Actions #4

Updated by ph03nix 10 months ago

cdywan wrote:

Can you clarify e.g. is this a recent regression? Maybe if you have a job that used to work? If it wasn't already cleaned up in the meanwhile, adding this ticket would stop it from being removed.

This is a recent regression. I have a automated script that creates symlinks to assets, which used to work until recently it stopped working with exactly this issue.

I only have failures on my own openQA instance, but it's trivial to reproduce the issue (use a symlinked asset)

Actions #5

Updated by okurz 10 months ago

  • Subject changed from openqa cannot start jobs with symlinked assets to [regression] openqa cannot start jobs with symlinked assets
  • Category changed from Feature requests to Regressions/Crashes
  • Target version changed from future to Ready
Actions #6

Updated by okurz 9 months ago

  • Subject changed from [regression] openqa cannot start jobs with symlinked assets to [regression] openqa cannot start jobs with symlinked assets size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by tinita 9 months ago

  • Status changed from Workable to In Progress
  • Assignee set to tinita
Actions #8

Updated by tinita 9 months ago

@ph03nix is there a difference for absolute and relative symlinks?
e.g. the link under HDD_1 points to another file with a relative or absolute path?
I just tested it, and can confirm that a relative path does not work (although it results in a different error), but with an absolute path it works.

The reason is that we create a (hard) link in the pool directory to the path in HDD_1, basically doing the equivalent of ln /path/to/hdd_filename /pool/hdd_filename, and if /path/to/hdd_filename -> relfile then that results in /pool/hdd_filename -> relfile, because it just hardlinks a symlink.

I can't see any recent changes related to that.

So using absolute symlinks on your side should be a workaround for now.

Meanwhile I'm thinking about what would be the best to do on our side.
We fall back to symlinks if the hard link fails, anyway, so we could just check if the asset file is a symlink.

Actions #9

Updated by openqa_review 9 months ago

  • Due date set to 2023-07-07

Setting due date based on mean cycle time of SUSE QE Tools

Actions #10

Updated by tinita 9 months ago

  • Status changed from In Progress to Feedback

https://github.com/os-autoinst/openQA/pull/5220 Do not hardlink symlink assets

Actions #11

Updated by ph03nix 9 months ago

Hey Tina! Good job and thanks for the fix! AFAICS this should resolve it as a whole. Do you still need something from me?

Actions #12

Updated by tinita 9 months ago

@ph03nix I would only be curious if you can confirm my conclusion:

  • Absolute symlinks have been working before my fix, only relative symlinks didn't
  • I can't see any recent changes related to this, so it's not a regression

Then I can turn this into a feature request retroactively :)

Actions #13

Updated by ph03nix 9 months ago

tinita wrote:

@ph03nix I would only be curious if you can confirm my conclusion:

  • Absolute symlinks have been working before my fix, only relative symlinks didn't

AFAICS my script always created relative symlinks and I'm still using relative symlinks there. Only recently I had to replace them by hardlinks.

  • I can't see any recent changes related to this, so it's not a regression

I share your observation that there have not been recent changes, however this is in contradiction with my observation that a automated script which I haven't touched in a year at some point in the last months stopped working.

From my observation, this was a regression, but that's IMHO also a minor and irrelevant taxonomical detail, as long as it's fixed ;-)

Then I can turn this into a feature request retroactively :)

No objections from my side, and thanks for the fix :-)

Actions #14

Updated by tinita 9 months ago

  • Status changed from Feedback to Resolved

ph03nix wrote:

I share your observation that there have not been recent changes, however this is in contradiction with my observation that a automated script which I haven't touched in a year at some point in the last months stopped working.

Ok. One last question: How many months could "last months" be roughly?
We didn't have tests for relative symlinks, so it's at least possible that there is some other place that could have influenced that behaviour, so I would be curious, but only if I have a more concrete time frame to look into :)

I will resolve this now in any case. Will keep it categorized as a bug.

Actions #15

Updated by ph03nix 9 months ago

tinita wrote:

Ok. One last question: How many months could "last months" be roughly?

3-6 months. I wish I could give you a more accurate time window but I can't :-(

We didn't have tests for relative symlinks, so it's at least possible that there is some other place that could have influenced that behaviour, so I would be curious, but only if I have a more concrete time frame to look into :)

I fully understand, unfortunately I can't tell because I didn't used the affected symlinked assets on my test instance in a long time.

Actions #16

Updated by tinita 9 months ago

Ok I see. thanks anyway!

Actions #17

Updated by ph03nix 9 months ago

Thanks to you! :-)

Actions

Also available in: Atom PDF