Project

General

Profile

Actions

action #108281

closed

test fails in svirt_upload_assets - can not upload qcow, error "File is too big"

Added by coolgw almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Support
Target version:
Start date:
2022-03-14
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario sle-15-SP4-Continuous-Migration-SLE15SP4-s390x-offline_sle12sp3_sles15sp3_sles15sp4_media_all_full_s390x_ph0@s390x-kvm-sle15 fails in
svirt_upload_assets

Test suite description

SLE12SP3 -> SLES15SP3 - SLES15SP4
No idea why "File is too big" msg show when upload qcow.

Reproducible

Fails since (at least) Build 101.1

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by mkittler almost 3 years ago

How big is the file? It must not be bigger than UPLOAD_MAX_MESSAGE_SIZE_GB which is by default 20 GiB.

Actions #2

Updated by okurz almost 3 years ago

  • Due date set to 2022-03-29
  • Category set to Support
  • Status changed from New to Feedback
  • Assignee set to mkittler
  • Priority changed from Normal to Low
  • Target version set to Ready
Actions #3

Updated by coolgw almost 3 years ago

@mkittler no idea how big it is, maybe already big then default value, so my understanding i can set UPLOAD_MAX_MESSAGE_SIZE_GB = 30 for a clone job?

Actions #4

Updated by mkittler almost 3 years ago

Yes, setting UPLOAD_MAX_MESSAGE_SIZE_GB to a bigger value (e.g. via `openqa-clone-job´) than the actual file size should work. Considering the default is already quite big I'm nevertheless wondering whether it would make more sense trying to keep the file smaller.

Actions #5

Updated by coolgw almost 3 years ago

base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.

Actions #6

Updated by mkittler almost 3 years ago

This should have been working - the value is also in vars.json and I don't see a local override via MOJO_MAX_MESSAGE_SIZE.

Btw, that's the code responsible for setting the limit: https://github.com/os-autoinst/os-autoinst/blob/9baeb0bfe086b40dec131d78a5f5b29778fac8c5/commands.pm#L228

Actions #7

Updated by okurz over 2 years ago

coolgw wrote:

base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.

I only saw that the job is incomplete, where did you see the message about "too big"?

Actions #8

Updated by mkittler over 2 years ago

I only saw that the job is incomplete, where did you see the message about "too big"?

https://openqa.nue.suse.com/tests/8322675#step/svirt_upload_assets/9


PR for removing the limit completely and printing a more specific error message: https://github.com/os-autoinst/os-autoinst/pull/1998

Actions #9

Updated by coolgw over 2 years ago

okurz wrote:

coolgw wrote:

base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.

I only saw that the job is incomplete, where did you see the message about "too big"?
the above job not capture the last screen since incomplete.
see this job.
https://openqa.suse.de/tests/8338790#

BTW: when this PR will implement in osd so i can have a try?

Actions #10

Updated by mkittler over 2 years ago

It has been deployed as of 24.03.22 06:28 CET.

Actions #11

Updated by livdywan over 2 years ago

I wonder if this should be blocking on #109046 since the tests can't be verified?

Actions #12

Updated by okurz over 2 years ago

  • Due date deleted (2022-03-29)
  • Status changed from Feedback to Resolved

cdywan wrote:

I wonder if this should be blocking on #109046 since the tests can't be verified?

I think this is not necessary. As decided in daily the change was introduced and deployed and we at least ensured that no related regressions had been introduced so far. As there was no feedback so far and due date is reached I suggest we resolve here and call it done. In case any problems are encountered please reopen.

Actions #13

Updated by coolgw over 2 years ago

base following case result, the issue still happen, since i saw the PR set default value to infinity so i SKIP the setting of UPLOAD_MAX_MESSAGE_SIZE_GB, but error msg "Maximum message size exceeded"
https://openqa.suse.de/tests/8434449#live

Actions #14

Updated by mkittler over 2 years ago

  • Status changed from Resolved to In Progress

I'll try reproducing this with unit tests (using a very big file).

Actions #15

Updated by mkittler over 2 years ago

It works for me locally via:

truncate -s 30G t/data/big-file
curl --progress-bar --form upload=@t/data/big-file --form target=assets_public http://localhost:49019/Hallo/upload_asset/foo
OK: foo

On http://localhost:49019 a test command server is running.

Note that temporary files end up (by default) under /tmp so this directory needs to be big enough. Of course the pool directory needs to be big enough as well. Maybe it makes most sense to have the temporary directory somewhere within the pool directory so the final move is just a move within the same file system and we don't risk using too much RAM (since /tmp is often tmpfs).

Actions #16

Updated by mkittler over 2 years ago

If I give it a too small temp dir¹ the command server gets just stuck writing the file and shows increasing memory usage (up to 2 GiB) until it finally resets the connection:

curl --progress-bar --form upload=@t/data/big-file --form target=assets_public http://localhost:50263/Hallo/upload_asset/foo
curl: (56) Recv failure: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt

Far from ideal but not really the issue I wanted to reproduce.

By the way, when giving the command server a MOJO_TMPDIR it cannot write to at all (e.g. I've tried /dev/full and a directory where it has no permissions) then it seems to stick to an in-memory asset. At least the memory usage increased so fast on my system that I had to stop the processes before running out of memory.


¹ e.g.

dd if=/dev/zero of=tmp-1gib bs=1M count=1024
/usr/sbin/mkfs.ext4 tmp-1gib
mkdir tmp-1gib.d
sudo mount -o loop tmp-1gib tmp-1gib.d
sudo chown martchus:users tmp-1gib.d
export MOJO_TMPDIR=tmp-1gib.d
Actions #17

Updated by mkittler over 2 years ago

A PR to avoid relying on /tmp being big enough: https://github.com/os-autoinst/os-autoinst/pull/2009

But judging from my experiments it won't help with this issue (which I still couldn't reproduce).

Actions #18

Updated by mkittler over 2 years ago

  • Status changed from In Progress to Feedback

Apparently we set MOJO_MAX_MESSAGE_SIZE in the start script of the worker. I haven't expected that but that's an easy explanation why setting UPLOAD_MAX_MESSAGE_SIZE_GB and MOJO_MAX_MESSAGE_SIZE didn't help. So the following PR should fix the issue: https://github.com/os-autoinst/openQA/pull/4586

Actions #19

Updated by kraih over 2 years ago

Tried to replicate the issue with a small Mojolicious unit test, but have been unable to unfortunately. The underlying Mojo::Asset::* modules throw an exception. https://github.com/mojolicious/mojo/commit/3ea094e3073f8e4d72adbc0d47fb3505b96e90a9

Actions #20

Updated by mkittler over 2 years ago

The mentioned PR has been merged and is expected to be deployed on Monday morning (CEST).

Actions #21

Updated by kraih over 2 years ago

I did manage to fix the memory leak in Mojolicious though. So memory usage should not be able to exceed the limit anymore. https://github.com/mojolicious/mojo/commit/ba9d3c2bceb9b1524e1ce07994b33da5ea9b8ce0

Actions #22

Updated by okurz over 2 years ago

https://github.com/os-autoinst/openQA/pull/4586 is deployed on OSD. I triggered https://openqa.suse.de/tests/8463395# in the scenario "sle-15-SP4-Continuous-Migration-SLE15SP4-s390x-offline_sle12sp3_sles15sp3_sles15sp4_media_all_full_s390x_ph0@s390x-kvm-sle15" which previously failed. Let's see the effect.

Actions #23

Updated by mkittler over 2 years ago

  • Status changed from Feedback to Resolved

@okurz Thanks - and it works now. (The job passed and it says "SLES-15-SP4-s390x-Build101.1-12SP3-15-SP3-ph0.qcow2 (23 GiB)" under assets.)

Actions

Also available in: Atom PDF