Project

General

Profile

Actions

action #123175

open

o3 fails to download images resulting in zero sized disk images/isos

Added by dancermak almost 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
2023-01-16
Due date:
% Done:

0%

Estimated time:

Description

I have restarted our testing efforts for kiwi which involve downloading a lot of images from OBS and booting them on o3. Unfortunately, o3 has been highly unreliable and often will "fetch" empty disk images or isos which then result in test failures.

Examples of such test failures from the kiwi builds:

These are just the failure from one full test run, which makes o3 unfortunately at the moment very unreliable for testing. Also, due to the nature of the issue, restarting jobs is no good, as o3 will simply restart the test with the same broken image.

Workaround

  • Remove the corrupted/zero-sized asset from disk (on the web UI host) and try again (just restarting the job doesn't work)
  • Avoid relying on decompression as supposedly only compressed assets were affected

Suggestions

  • Better log messages by the downloader (to ensure if it really downloaded a zero length file with a 200 status)
  • Ask when those builds are normally scheduled to be able to investigate the behavior without delay
  • Check https://openqa.opensuse.org/group_overview/85 for new builds of relevant jobs
  • Check the Minion dashboard for relevant download jobs (https://openqa.opensuse.org/minion/jobs?task=download_asset) after jobs have been scheduled
  • Run xzgrep -i -P 'Downloading.*kiwi' /var/log/openqa_gru* on o3 to find relevant log messages
  • Try to reproduce download problems using curl or by restarting relevant download jobs in the Minion dashboard

Related issues 2 (0 open2 closed)

Related to openSUSE admin - tickets #120067: Downloading of GNOME_Next.x86_64-43.1-Build22.189.iso differs, expected 1.4 GiB but downloaded 74 MiB, aborts prematurely oftenResolvedbmwiedemann2022-11-08

Actions
Related to openQA Project (public) - action #123649: Error message in gru logs: Could not chdir back to start dir '' size:MResolvedmkittler2023-01-252023-02-09

Actions
Actions #1

Updated by tinita almost 2 years ago

  • Project changed from QA (public) to openQA Project (public)
  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #2

Updated by okurz almost 2 years ago

  • Related to tickets #120067: Downloading of GNOME_Next.x86_64-43.1-Build22.189.iso differs, expected 1.4 GiB but downloaded 74 MiB, aborts prematurely often added
Actions #3

Updated by okurz almost 2 years ago

Could this be related to #120067?

Actions #4

Updated by okurz almost 2 years ago

  • Priority changed from Normal to Urgent
Actions #5

Updated by mkittler almost 2 years ago

I had a look yesterday but unfortunately I'm not sure how to get further information, e.g. xzgrep -i -P 'Downloading.*kiwi' /var/log/openqa_gru* on o3 returns no results. (There are generally logs for these downloads, e.g. just xzgrep -i -P 'Downloading.*' /var/log/openqa_gru* returns many results.)

Actions #6

Updated by mkittler almost 2 years ago

It isn't clear whether the problem is within openQA's downloader itself or on the remote end. If the problem it on the remote end, maybe would downloader could at least do some sanity checks, e.g. whether the size is zero. If the sanity jobs would fail the jobs would be incompleted with a meaningful reason and maybe we could even retry the download a few times before it comes to that.

Actions #7

Updated by dancermak almost 2 years ago

mkittler wrote:

It isn't clear whether the problem is within openQA's downloader itself or on the remote end. If the problem it on the remote end, maybe would downloader could at least do some sanity checks, e.g. whether the size is zero. If the sanity jobs would fail the jobs would be incompleted with a meaningful reason and maybe we could even retry the download a few times before it comes to that.

I find it unlikely to be on the remote end, as I never see any of these issues on my local machine. I am also quite close to o3 (geographically speaking), so I'd guess that we would use the same mirrors.

Actions #8

Updated by okurz almost 2 years ago

Two ideas what one could do to check:

  1. Try to reproduce manually with wget/curl from o3 webui host and/or workers
  2. Try to reproduce with manually spawned minion controlled download jobs
Actions #9

Updated by mkittler almost 2 years ago

I've already got a 404 response for the first best job (https://openqa.opensuse.org/tests/3050041, http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images/kiwi-test-image-luks.x86_64-1.15.1-Build44.6.raw) I found.

The same counts for the passing jobs on https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20230118&groupid=85 (the latest build as of now). They have been passing so presumably openQA could even download the assets at the time but now they're all gone. That let's me wonder whether the problem is simply that openQA is often just too late.

Actions #10

Updated by mkittler almost 2 years ago

Due to the new build there are now also logs present when searching via xzgrep -i -P 'Downloading.*kiwi' /var/log/openqa_gru*. However, there's not much interesting being logged as the latest build actually didn't run into download errors anymore¹. I'll try to start such a download job locally to check the behavior in case of a 404 error. I wouldn't be surprised if that would leave a zero-sized asset.


¹ The log just contains messages like

[2023-01-18T07:36:38.019960Z] [debug] Process 5710 is performing job "2182254" with task "download_asset"
[2023-01-18T07:36:38.113624Z] [debug] [#2182254] Downloading "http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images/kiwi-test-image-MicroOS.x86_64-16.0.0-Build46.6.qcow2" to "/var/lib/openqa/share/factory/hdd/kiwi-test-image-MicroOS.x86_64-16.0.0-Build46.6.qcow2"
[2023-01-18T07:36:38.114237Z] [info] [#2182254] Downloading "kiwi-test-image-MicroOS.x86_64-16.0.0-Build46.6.qcow2" from "http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images/kiwi-test-image-MicroOS.x86_64-16.0.0-Build46.6.qcow2"
Actions #11

Updated by mkittler almost 2 years ago

  • Assignee set to mkittler
Actions #12

Updated by mkittler almost 2 years ago

In case of a 404 error the job's reason is set to Reason: preparation failed: … and we would see a clear log message about it:

[debug] Process 16123 is performing job "1820" with task "download_asset"
[debug] [#1820] Downloading "http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images/iso/kiwi-test-image-disk-legacy.x86_64-1.42.1-Build49.5.install.iso" to "/hdd/openqa-devel/openqa/share/factory/iso/kiwi-test-image-disk-legacy.x86_64-1.42.1-Build49.5.install.iso"
[info] [#1820] Downloading "kiwi-test-image-disk-legacy.x86_64-1.42.1-Build49.5.install.iso" from "http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images/iso/kiwi-test-image-disk-legacy.x86_64-1.42.1-Build49.5.install.iso"
[debug] Process 16125 is performing job "1821" with task "download_asset"
[info] [#1820] Download of "/hdd/openqa-devel/openqa/share/factory/iso/kiwi-test-image-disk-legacy.x86_64-1.42.1-Build49.5.install.iso" failed: 404 Not Found
[error] [#1820] Downloading "http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images/iso/kiwi-test-image-disk-legacy.x86_64-1.42.1-Build49.5.install.iso" failed with: Download of "/hdd/openqa-devel/openqa/share/factory/iso/kiwi-test-image-disk-legacy.x86_64-1.42.1-Build49.5.install.iso" failed: 404 Not Found
[debug] [pid:16123] _carry_over_candidate(3488): _failure_reason=GRU:failed
[debug] Process 16135 is performing job "1822" with task "finalize_job_results"

This also leaves no empty file in the file system. The same counts if the asset needs to be extracted.

That means:

  • The issue we've seen on o3 was definitely not caused by assets simply gone on the remote end (even though the quick cleanup could be problematic as well).
  • The error handling in our HTTP downloading/extraction code seems to work generally (and does not just leave us with zero-sized files and than continuing the test execution). Judging by the code this is also true for other errors than 404.

To follow the suggestions from @okurz I can download some other assets on http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images from o3 (because the URLs referenced by current jobs are already 404).

Actions #13

Updated by mkittler almost 2 years ago

Looks like some files on http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images also give a 404 error despite being still listed, e.g. for http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images/kiwi-test-image-custom-partitions.x86_64.raw.xz I get a 404 page after being redirected to the mirror. Also the 404 errors that happen only after a redirection are handled well by openQA's downloader. So that cannot be our case as well.

Due to the mirroring error I couldn't do that many successful download. However, the downloads that didn't run into the problem actually worked without some other problems. So I guess at least downloading via wget from o3 (as suggested in #123175#note-8) works normally.

I have also checked the downloader code more closely. It even has a check to see whether the size of the actually downloaded file matches the size specified in the HTTP header. I couldn't spot any mistake except that it would likely run into the error Size of "$target" differs, expected $header_size but downloaded $actual_size if the Content-Length header is not present in the HTTP response. However, that kind of error would be handled and not lead to the behavior we saw.

Actions #14

Updated by mkittler almost 2 years ago

  • Status changed from New to Feedback

As mentioned before, when I first looked into this issue, no relevant logs were present. We started too late with the investigation to tell whether the remote end was at fault.

So since I don't know what has happened I would wait until the problem occurs again keeping an eye on https://openqa.opensuse.org/group_overview/85.

Actions #15

Updated by mkittler almost 2 years ago

  • Subject changed from o3 fails to download images resulting in zero sized disk images/isos to o3 fails to download images resulting in zero sized disk images/isos size:M
  • Description updated (diff)
Actions #16

Updated by mkittler almost 2 years ago

  • Description updated (diff)
Actions #17

Updated by mkittler almost 2 years ago

A new build has just been scheduled and so far all download jobs shown on the Minion dashboard are either finished (which means nothing went totally wrong but download errors might have been occurred¹) or still waiting/active. There are no failures so far (except old/unrelated jobs) so nothing went completely wrong (e.g. an unhandled exception in download jobs). I've checked the size of some downloaded assets and couldn't find any suspiciously small files. So it doesn't look like we'd be able to reproduce the problem mentioned in the ticket description but I'll keep tracking what's going on. (None of the openQA jobs in https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20230124&groupid=85 have already been started at this point.)

I'm also following the gru logs via tail --follow /var/log/openqa and so far everything looks good (except some unrelated problems saving needles).


¹ Download errors will not result in failing Minion jobs anymore but should lead to incomplete jobs with the according reason - something that did not happen with the jobs mentioned in the ticket description.

Actions #18

Updated by mkittler almost 2 years ago

  • Priority changed from Urgent to Normal

The most recent build turned out exactly like the one before. So there are only passing/softfailed jobs and the download problems mentioned in the ticket description could not be reproduced.

So there's definitely not a general problem here blocking those kinds of tests. Hence I'm lowering the prio of this ticket. However, this also means I'm not sure how to proceed. Since the issue isn't easily reproducible anymore and I've already ruled out several possibilities. So I don't know what the source of this problem is (and thus also not how to fix it).

Actions #19

Updated by okurz almost 2 years ago

One workaround for restarting affected job scenarios including a re-download of assets is to go to https://openqa.opensuse.org/admin/assets, copy-paste the asset name from job details into that page and click the "delete asset" button. Then retrigger tests which should trigger a re-download.

Following ideas:

@mkittler consider the above points for this ticket with quick implementations, otherwise I suggest to put into #65271

Actions #20

Updated by mkittler almost 2 years ago

I have been creating a PR for the last suggestion as it is very easy to implement: https://github.com/os-autoinst/openQA/pull/4986

I don't think we can provide a link to the GRU job because when I've initially investigated this ticket I couldn't find any related GRU jobs anymore. Likely those jobs have already been cleaned up (as they are usually cleaned up quite fast unless they'd be failures). Note that this should normally also not required because if such a job fails the information should be present as reason in the job's infobox (regardless of whether the job has been deleted meanwhile). Why this didn't work in this case I don't know. Maybe the jobs couldn't even be created in the first place. Until we can pin down the problem more closely I don't think this idea is feasible/useful.

About adding a specific test module: @dancermak do you think it is useful? From my perspective it shouldn't be too hard to figure out that the test failed due to a zero-sized asset so I don't see much benefit in adding an extra check for that. However, maybe depending on your experience this might not be generally true.

Actions #21

Updated by dancermak almost 2 years ago

mkittler wrote:

About adding a specific test module: @dancermak do you think it is useful? From my perspective it shouldn't be too hard to figure out that the test failed due to a zero-sized asset so I don't see much benefit in adding an extra check for that. However, maybe depending on your experience this might not be generally true.

This looks very useful!

I am not sure if we need a general check for zero sized disk images. Once you know that this is a potential issue, you tend to spot that immediately. Before I was made aware that this could happen, this was quite the puzzle though.

Actions #22

Updated by mkittler almost 2 years ago

  • Related to action #123649: Error message in gru logs: Could not chdir back to start dir '' size:M added
Actions #23

Updated by mkittler almost 2 years ago

The overview https://openqa.opensuse.org/tests/overview?build=20230124 shows actually many more kiwi jobs. So some jobs of the last build have actually failed. A few jobs even show a similar symptom (e.g. https://openqa.opensuse.org/tests/3063931) but I haven't seen any zero-sized assets. (I have checked all jobs in that build that haven't passed/softfailed.)

Note that #123649 is supposedly really related considering my findings in #123649#note-11.

Actions #24

Updated by mkittler almost 2 years ago

The PR for UI improvements in openQA has been merged.

The ticket #123649 was only related in the sense that the kiwi jobs mentioned here triggered the error message mentioned in that ticket. However, this was another problem and not the reason for zero-sized downloads. At least I'd say so because when reproducing the Could not chdir back to start dir '' error the files themselves have actually been downloaded and extracted (and had a size, I have also verified it by computing the sha256 checksum).

Since all cases mentioned in the ticket description use the decompression feature I'll have a close look at. I learned from #123649 that the Archive::Extract module we're using doesn't return a negative return code in all error cases. Maybe the error handling of that module is broken in other ways, too. That would explain why the error handling on our side doesn't do its job (as from our side it looks like Archive::Extract succeeded).

About adding a test module: @dancermak Since you find it useful, do you want to implement that as maintainer of those tests? Otherwise I can also have a look. Maybe I can also add a sanity check in openQA itself at least for the extraction part (e.g. an uncompressed file should be at least as big as the compressed version¹).


¹ Although, I'm not sure whether that makes generally sense. Supposedly, in unfortunate cases, compression can make things worse.

Actions #25

Updated by okurz almost 2 years ago

  • Status changed from Feedback to In Progress

mkittler is checking Archive::Extract

Actions #26

Updated by mkittler almost 2 years ago

To answer the footnote from my previous comment: Compression can make it worse, e.g. running xz on a file containing a sha256sum (base64 encoded + newline) the sizes goes up from 65 byte to 124 byte. So a sanity check for compressed size < uncompressed size doesn't make generally sense.

Actions #27

Updated by mkittler almost 2 years ago

  • Status changed from In Progress to Feedback

Since I couldn't find any obvious mistake in Archive::Extract I dug a little bit deeper. Unfortunately Perl's compression/uncompression "ecosystem" is very over-engineered and the code doesn't look very well in general (many uses of caller that make it hard following the code, big blocks of out-commented code, different helpers for error handling sprinkled all over the different modules instead of having a common approach and so on). If there's a bug in the error handling somewhere then I couldn't spot it but it is really hard to review all this code. I suppose that's the mess one has simply to deal with when using Perl in the year 2023.

Maybe it is beneficial to tell Archive::Extract to invoke the unxz binary instead of relying on the Perl implementation. This can be done by setting a global variable and would also change it to use an external binary for other archive formats. One had to take care to allow all of these binaries in apparmor and of course also to add the corresponding packages as dependencies.

Maybe it is best to abandon Archive::Extract (https://github.com/jib/archive-extract) completely as it also lacks support for zstd (which we likely want to support sooner or later). The Perl module https://metacpan.org/pod/Archive::Libarchive (https://github.com/uperl/Archive-Libarchive) seems like a promising alternative. It simply uses the C library libarchive under the hood which is likely the most comprehensive archiving library available and we'd supposedly get support for newer formats via libarchive even without having to touch any Perl code. This Perl module is currently only available on https://build.opensuse.org/package/show/devel:languages:perl:CPAN-A/perl-Archive-Libarchive so we needed fix its dependencies first (it is currently unresolvable), then submit it to DLP first, then to Factory and then link it in our devel repos for Leap.

Not sure what the best approach is.


Note that it is also just a theory that the extraction code is the problem as I could not pin-down the problem exactly. However, this theory is based on:

  • All jobs mentioned in the ticket description use the extraction feature.
  • @dancermak stated: "I find it unlikely to be on the remote end, as I never see any of these issues on my local machine. I am also quite close to o3 (geographically speaking), so I'd guess that we would use the same mirrors."
  • The error handling in the download code itself seems more reliable than the error handling of the extraction code. (I have tested the error handling of the download code for various cases and could never reproduce an error being ignored.)
Actions #28

Updated by okurz almost 2 years ago

mkittler wrote:

Maybe it is best to abandon Archive::Extract [...]

I quickly looked into what I could find regarding the dependency chain. It looks like we would need to start with https://build.opensuse.org/package/live_build_log/devel:languages:perl:CPAN-F/perl-FFI-Platypus/openSUSE_Tumbleweed/x86_64 failing with

[   13s] +/usr/bin/perl -Iinc -MAlien::Base::Wrapper=Alien::FFI,Alien::psapi -e cc -- -D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -c -o blib/lib/auto/share/dist/FFI-Platypus/probe/src/dlrun.o blib/lib/auto/share/dist/FFI-Platypus/probe/src/dlrun.c
[   13s] Can't locate Alien/FFI.pm in @INC (you may need to install the Alien::FFI module) (@INC contains: inc /usr/lib/perl5/site_perl/5.36.0/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.36.0 /usr/lib/perl5/vendor_perl/5.36.0/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.36.0 /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi /usr/lib/perl5/5.36.0 /usr/lib/perl5/site_perl) at inc/Alien/Base/Wrapper.pm line 68.
[   13s] BEGIN failed--compilation aborted.

which looks like something that we should be able to fix by ensuring that https://metacpan.org/pod/Alien::FFI is packaged. Maybe https://build.opensuse.org/package/show/devel%3Alanguages%3Aperl%3ACPAN-A/perl-Alien already provides it and I couldn't see that. Then the dependency chain might resolve and we can submit the dependency which otherwise doesn't look too bad.

Note that it is also just a theory that the extraction code is the problem as I could not pin-down the problem exactly. [...]

Neither I do have another lead. I suggest if the above approach does not look feasible within a reasonable time we will not follow on with a full problem resolution until we learned more.

Actions #29

Updated by livdywan almost 2 years ago

We discussed it in the unblock. Why not use bsdtar and keep it simple.

Actions #31

Updated by mkittler almost 2 years ago

The PR has been merged and deployed. Let's see how well it behaves in production.

The last kiwi images build is from 24 days ago so meanwhile nothing has happened in production that would give us further clues. So when the switch to bsdtar generally works I'd leave it at that and resolve the ticket.

Actions #32

Updated by dancermak almost 2 years ago

mkittler wrote:

The PR has been merged and deployed. Let's see how well it behaves in production.

The last kiwi images build is from 24 days ago so meanwhile nothing has happened in production that would give us further clues. So when the switch to bsdtar generally works I'd leave it at that and resolve the ticket.

I have triggered a new round of builds today and got a ton of new failures with either zero sized image files (e.g. https://openqa.opensuse.org/tests/3135908) or workers that fail to start (e.g. https://openqa.opensuse.org/tests/3135926). So unfortunately it looks like switching to bsdtar did not solve the issue.

Actions #33

Updated by mkittler almost 2 years ago

  • Status changed from Feedback to In Progress
Actions #34

Updated by mkittler almost 2 years ago

The workers that failed to start are likely a different issue. The jobs have been automatically restarted, right?

I tried to reproduce this again locally via

openqa-clone-job … https://openqa.opensuse.org/tests/3135908 HDD_1_DECOMPRESS_URL=http://download.opensuse.org/repositories/Virtualization:/Appliances:/Images:/Testing_x86:/tumbleweed/images/kiwi-test-image-custom-partitions.x86_64-1.15.1-Build45.35.raw.xz

but the extraction works just fine (and the test also passed). I had to specify a different URL because the one from the original job is a 404 page now. When keeping the URL as-is I'm getting an incomplete with the Reason: preparation failed: Downloading "http://download.opensuse.org/…/kiwi-test-image-custom-partitions.x86_64-1.15.1-Build45.28.raw.xz" failed with: … and there are no zero-sized leftovers on disk. So the error handling works in this case.

I've already been investigating the error handling on our side before. I couldn't find anything so I concluded the problem must be in the archiving code. Apparently I must have missed something. I guess I can go though the code and tests again. Maybe I can spot a mistake this time.

Actions #35

Updated by openqa_review almost 2 years ago

  • Due date set to 2023-03-15

Setting due date based on mean cycle time of SUSE QE Tools

Actions #36

Updated by okurz almost 2 years ago

  • Subject changed from o3 fails to download images resulting in zero sized disk images/isos size:M to o3 fails to download images resulting in zero sized disk images/isos

With the incomplete description I suggest to re-estimate. As discussed with mkittler they will "try just one or two more things" :) But if that fails as we are unclear about a path how to continue we should put a better description into the ticket, e.g. a "Workaround" section, and unassign and remove it from our backlog as in: "Yes, the issue is there but we don't know how to reproduce nor fix further and we don't plan further work".

Actions #37

Updated by mkittler almost 2 years ago

  • Description updated (diff)
  • Status changed from In Progress to New
  • Assignee deleted (mkittler)
  • Target version changed from Ready to future

I tested what happens on a connection error but it is handled as expected. I tested what happens with a file that is recognized as xz compressed (magic number / header correct) but it is still damaged. This is also handled correctly. There's no zero-sized leftover and the test isn't even started and instead ends up incomplete with:

Reason: Downloading "http://localhost:8000/test.xz" failed with: Command "bsdcat '/hdd/openqa-devel/openqa/share/factory/tmp/test.xz' 2>&1 1>'/hdd/openqa-devel/openqa/share/factory/hdd/kiwi-test-image-custom-partitions.x86_64-1.15.1-Build45.28.raw'" exited with non-zero return code 1: bsdcat: /hdd/openqa-devel/openqa/share/factory/tmp/test.xz: Lzma library error: Corrupted input data

(The error is quite lengthy but also clearly shows what was attempted and why it failed.)


Since it is unclear how to continue I'm doing what @okurz mentioned in his previous comment. I've added a "Workaround" section although I suppose @dancermak is already aware of it.

Actions #38

Updated by dancermak almost 2 years ago

mkittler wrote:

I've added a "Workaround" section although I suppose @dancermak is already aware of it.

Unfortunately, the workarounds are not really an option for my intended use-case:

  • manually removing the files from the webui host has a few issues. First this doesn't scale at all, given that I've often seen as many as half of the tests of a full run fail. Additionally it requires someone with very high permissions to comb through o3, find the correct files for the failed tests, remove them, restart all the tests, report back on github…
  • not using compression is not an option either. Last time we turned of compression of the disk images, we filled up OBS and killed it.

Wouldn't it be possible for openQA to check after a test failed, whether one of the disk/iso assets has zero size and if yes, delete the asset and re-try? This would have the advantage that this issue gets worked around in one place and not by every single user over and over again.

Actions #39

Updated by okurz almost 2 years ago

  • Due date deleted (2023-03-15)

@dancermak AFAIK the problem is that we don't have steps to reproduce the problem. If you could help us with that by adding an according section to https://progress.opensuse.org/projects/openqav3/wiki/#Defects and define how we could find a sufficient amount of affected jobs then likely we will have something that we can work with to fix the underlying issue or apply said "workaround" automatically.

Actions

Also available in: Atom PDF