Project

General

Profile

action #108281

test fails in svirt_upload_assets - can not upload qcow, error "File is too big"

Added by coolgw 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Support
Target version:
Start date:
2022-03-14
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP4-Continuous-Migration-SLE15SP4-s390x-offline_sle12sp3_sles15sp3_sles15sp4_media_all_full_s390x_ph0@s390x-kvm-sle15 fails in
svirt_upload_assets

Test suite description

SLE12SP3 -> SLES15SP3 - SLES15SP4
No idea why "File is too big" msg show when upload qcow.

Reproducible

Fails since (at least) Build 101.1

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest

History

#1 Updated by mkittler 3 months ago

How big is the file? It must not be bigger than UPLOAD_MAX_MESSAGE_SIZE_GB which is by default 20 GiB.

#2 Updated by okurz 3 months ago

  • Due date set to 2022-03-29
  • Category set to Support
  • Status changed from New to Feedback
  • Assignee set to mkittler
  • Priority changed from Normal to Low
  • Target version set to Ready

#3 Updated by coolgw 3 months ago

mkittler no idea how big it is, maybe already big then default value, so my understanding i can set UPLOAD_MAX_MESSAGE_SIZE_GB = 30 for a clone job?

#4 Updated by mkittler 3 months ago

Yes, setting UPLOAD_MAX_MESSAGE_SIZE_GB to a bigger value (e.g. via `openqa-clone-job´) than the actual file size should work. Considering the default is already quite big I'm nevertheless wondering whether it would make more sense trying to keep the file smaller.

#5 Updated by coolgw 3 months ago

base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.

#6 Updated by mkittler 3 months ago

This should have been working - the value is also in vars.json and I don't see a local override via MOJO_MAX_MESSAGE_SIZE.

Btw, that's the code responsible for setting the limit: https://github.com/os-autoinst/os-autoinst/blob/9baeb0bfe086b40dec131d78a5f5b29778fac8c5/commands.pm#L228

#7 Updated by okurz 3 months ago

coolgw wrote:

base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.

I only saw that the job is incomplete, where did you see the message about "too big"?

#8 Updated by mkittler 3 months ago

I only saw that the job is incomplete, where did you see the message about "too big"?

https://openqa.nue.suse.com/tests/8322675#step/svirt_upload_assets/9


PR for removing the limit completely and printing a more specific error message: https://github.com/os-autoinst/os-autoinst/pull/1998

#9 Updated by coolgw 3 months ago

okurz wrote:

coolgw wrote:

base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.

I only saw that the job is incomplete, where did you see the message about "too big"?
the above job not capture the last screen since incomplete.
see this job.
https://openqa.suse.de/tests/8338790#

BTW: when this PR will implement in osd so i can have a try?

#10 Updated by mkittler 3 months ago

It has been deployed as of 24.03.22 06:28 CET.

#11 Updated by cdywan 3 months ago

I wonder if this should be blocking on #109046 since the tests can't be verified?

#12 Updated by okurz 3 months ago

  • Due date deleted (2022-03-29)
  • Status changed from Feedback to Resolved

cdywan wrote:

I wonder if this should be blocking on #109046 since the tests can't be verified?

I think this is not necessary. As decided in daily the change was introduced and deployed and we at least ensured that no related regressions had been introduced so far. As there was no feedback so far and due date is reached I suggest we resolve here and call it done. In case any problems are encountered please reopen.

#13 Updated by coolgw 3 months ago

base following case result, the issue still happen, since i saw the PR set default value to infinity so i SKIP the setting of UPLOAD_MAX_MESSAGE_SIZE_GB, but error msg "Maximum message size exceeded"
https://openqa.suse.de/tests/8434449#live

#14 Updated by mkittler 3 months ago

  • Status changed from Resolved to In Progress

I'll try reproducing this with unit tests (using a very big file).

#15 Updated by mkittler 3 months ago

It works for me locally via:

truncate -s 30G t/data/big-file
curl --progress-bar --form upload=@t/data/big-file --form target=assets_public http://localhost:49019/Hallo/upload_asset/foo
OK: foo

On http://localhost:49019 a test command server is running.

Note that temporary files end up (by default) under /tmp so this directory needs to be big enough. Of course the pool directory needs to be big enough as well. Maybe it makes most sense to have the temporary directory somewhere within the pool directory so the final move is just a move within the same file system and we don't risk using too much RAM (since /tmp is often tmpfs).

#16 Updated by mkittler 3 months ago

If I give it a too small temp dir¹ the command server gets just stuck writing the file and shows increasing memory usage (up to 2 GiB) until it finally resets the connection:

curl --progress-bar --form upload=@t/data/big-file --form target=assets_public http://localhost:50263/Hallo/upload_asset/foo
curl: (56) Recv failure: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt

Far from ideal but not really the issue I wanted to reproduce.

By the way, when giving the command server a MOJO_TMPDIR it cannot write to at all (e.g. I've tried /dev/full and a directory where it has no permissions) then it seems to stick to an in-memory asset. At least the memory usage increased so fast on my system that I had to stop the processes before running out of memory.


¹ e.g.

dd if=/dev/zero of=tmp-1gib bs=1M count=1024
/usr/sbin/mkfs.ext4 tmp-1gib
mkdir tmp-1gib.d
sudo mount -o loop tmp-1gib tmp-1gib.d
sudo chown martchus:users tmp-1gib.d
export MOJO_TMPDIR=tmp-1gib.d

#17 Updated by mkittler 3 months ago

A PR to avoid relying on /tmp being big enough: https://github.com/os-autoinst/os-autoinst/pull/2009

But judging from my experiments it won't help with this issue (which I still couldn't reproduce).

#18 Updated by mkittler 3 months ago

  • Status changed from In Progress to Feedback

Apparently we set MOJO_MAX_MESSAGE_SIZE in the start script of the worker. I haven't expected that but that's an easy explanation why setting UPLOAD_MAX_MESSAGE_SIZE_GB and MOJO_MAX_MESSAGE_SIZE didn't help. So the following PR should fix the issue: https://github.com/os-autoinst/openQA/pull/4586

#19 Updated by kraih 3 months ago

Tried to replicate the issue with a small Mojolicious unit test, but have been unable to unfortunately. The underlying Mojo::Asset::* modules throw an exception. https://github.com/mojolicious/mojo/commit/3ea094e3073f8e4d72adbc0d47fb3505b96e90a9

#20 Updated by mkittler 3 months ago

The mentioned PR has been merged and is expected to be deployed on Monday morning (CEST).

#21 Updated by kraih 3 months ago

I did manage to fix the memory leak in Mojolicious though. So memory usage should not be able to exceed the limit anymore. https://github.com/mojolicious/mojo/commit/ba9d3c2bceb9b1524e1ce07994b33da5ea9b8ce0

#22 Updated by okurz 3 months ago

https://github.com/os-autoinst/openQA/pull/4586 is deployed on OSD. I triggered https://openqa.suse.de/tests/8463395# in the scenario "sle-15-SP4-Continuous-Migration-SLE15SP4-s390x-offline_sle12sp3_sles15sp3_sles15sp4_media_all_full_s390x_ph0@s390x-kvm-sle15" which previously failed. Let's see the effect.

#23 Updated by mkittler 3 months ago

  • Status changed from Feedback to Resolved

okurz Thanks - and it works now. (The job passed and it says "SLES-15-SP4-s390x-Build101.1-12SP3-15-SP3-ph0.qcow2 (23 GiB)" under assets.)

Also available in: Atom PDF