action #108281
closedtest fails in svirt_upload_assets - can not upload qcow, error "File is too big"
Description
Observation¶
openQA test in scenario sle-15-SP4-Continuous-Migration-SLE15SP4-s390x-offline_sle12sp3_sles15sp3_sles15sp4_media_all_full_s390x_ph0@s390x-kvm-sle15 fails in
svirt_upload_assets
Test suite description¶
SLE12SP3 -> SLES15SP3 - SLES15SP4
No idea why "File is too big" msg show when upload qcow.
Reproducible¶
Fails since (at least) Build 101.1
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by mkittler almost 3 years ago
How big is the file? It must not be bigger than UPLOAD_MAX_MESSAGE_SIZE_GB
which is by default 20 GiB.
Updated by okurz almost 3 years ago
- Due date set to 2022-03-29
- Category set to Support
- Status changed from New to Feedback
- Assignee set to mkittler
- Priority changed from Normal to Low
- Target version set to Ready
Updated by coolgw almost 3 years ago
@mkittler no idea how big it is, maybe already big then default value, so my understanding i can set UPLOAD_MAX_MESSAGE_SIZE_GB = 30 for a clone job?
Updated by mkittler almost 3 years ago
Yes, setting UPLOAD_MAX_MESSAGE_SIZE_GB
to a bigger value (e.g. via `openqa-clone-job´) than the actual file size should work. Considering the default is already quite big I'm nevertheless wondering whether it would make more sense trying to keep the file smaller.
Updated by coolgw almost 3 years ago
base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.
Updated by mkittler almost 3 years ago
This should have been working - the value is also in vars.json and I don't see a local override via MOJO_MAX_MESSAGE_SIZE.
Btw, that's the code responsible for setting the limit: https://github.com/os-autoinst/os-autoinst/blob/9baeb0bfe086b40dec131d78a5f5b29778fac8c5/commands.pm#L228
Updated by okurz over 2 years ago
coolgw wrote:
base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.
I only saw that the job is incomplete, where did you see the message about "too big"?
Updated by mkittler over 2 years ago
I only saw that the job is incomplete, where did you see the message about "too big"?
https://openqa.nue.suse.com/tests/8322675#step/svirt_upload_assets/9
PR for removing the limit completely and printing a more specific error message: https://github.com/os-autoinst/os-autoinst/pull/1998
Updated by coolgw over 2 years ago
okurz wrote:
coolgw wrote:
base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.
I only saw that the job is incomplete, where did you see the message about "too big"?
the above job not capture the last screen since incomplete.
see this job.
https://openqa.suse.de/tests/8338790#
BTW: when this PR will implement in osd so i can have a try?
Updated by mkittler over 2 years ago
It has been deployed as of 24.03.22 06:28 CET.
Updated by livdywan over 2 years ago
I wonder if this should be blocking on #109046 since the tests can't be verified?
Updated by okurz over 2 years ago
- Due date deleted (
2022-03-29) - Status changed from Feedback to Resolved
cdywan wrote:
I wonder if this should be blocking on #109046 since the tests can't be verified?
I think this is not necessary. As decided in daily the change was introduced and deployed and we at least ensured that no related regressions had been introduced so far. As there was no feedback so far and due date is reached I suggest we resolve here and call it done. In case any problems are encountered please reopen.
Updated by coolgw over 2 years ago
base following case result, the issue still happen, since i saw the PR set default value to infinity so i SKIP the setting of UPLOAD_MAX_MESSAGE_SIZE_GB, but error msg "Maximum message size exceeded"
https://openqa.suse.de/tests/8434449#live
Updated by mkittler over 2 years ago
- Status changed from Resolved to In Progress
I'll try reproducing this with unit tests (using a very big file).
Updated by mkittler over 2 years ago
It works for me locally via:
truncate -s 30G t/data/big-file
curl --progress-bar --form upload=@t/data/big-file --form target=assets_public http://localhost:49019/Hallo/upload_asset/foo
OK: foo
On http://localhost:49019 a test command server is running.
Note that temporary files end up (by default) under /tmp
so this directory needs to be big enough. Of course the pool directory needs to be big enough as well. Maybe it makes most sense to have the temporary directory somewhere within the pool directory so the final move is just a move within the same file system and we don't risk using too much RAM (since /tmp
is often tmpfs).
Updated by mkittler over 2 years ago
If I give it a too small temp dir¹ the command server gets just stuck writing the file and shows increasing memory usage (up to 2 GiB) until it finally resets the connection:
curl --progress-bar --form upload=@t/data/big-file --form target=assets_public http://localhost:50263/Hallo/upload_asset/foo
curl: (56) Recv failure: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt
Far from ideal but not really the issue I wanted to reproduce.
By the way, when giving the command server a MOJO_TMPDIR it cannot write to at all (e.g. I've tried /dev/full
and a directory where it has no permissions) then it seems to stick to an in-memory asset. At least the memory usage increased so fast on my system that I had to stop the processes before running out of memory.
¹ e.g.
dd if=/dev/zero of=tmp-1gib bs=1M count=1024
/usr/sbin/mkfs.ext4 tmp-1gib
mkdir tmp-1gib.d
sudo mount -o loop tmp-1gib tmp-1gib.d
sudo chown martchus:users tmp-1gib.d
export MOJO_TMPDIR=tmp-1gib.d
Updated by mkittler over 2 years ago
A PR to avoid relying on /tmp
being big enough: https://github.com/os-autoinst/os-autoinst/pull/2009
But judging from my experiments it won't help with this issue (which I still couldn't reproduce).
Updated by mkittler over 2 years ago
- Status changed from In Progress to Feedback
Apparently we set MOJO_MAX_MESSAGE_SIZE
in the start script of the worker. I haven't expected that but that's an easy explanation why setting UPLOAD_MAX_MESSAGE_SIZE_GB
and MOJO_MAX_MESSAGE_SIZE
didn't help. So the following PR should fix the issue: https://github.com/os-autoinst/openQA/pull/4586
Updated by kraih over 2 years ago
Tried to replicate the issue with a small Mojolicious unit test, but have been unable to unfortunately. The underlying Mojo::Asset::*
modules throw an exception. https://github.com/mojolicious/mojo/commit/3ea094e3073f8e4d72adbc0d47fb3505b96e90a9
Updated by mkittler over 2 years ago
The mentioned PR has been merged and is expected to be deployed on Monday morning (CEST).
Updated by kraih over 2 years ago
I did manage to fix the memory leak in Mojolicious though. So memory usage should not be able to exceed the limit anymore. https://github.com/mojolicious/mojo/commit/ba9d3c2bceb9b1524e1ce07994b33da5ea9b8ce0
Updated by okurz over 2 years ago
https://github.com/os-autoinst/openQA/pull/4586 is deployed on OSD. I triggered https://openqa.suse.de/tests/8463395# in the scenario "sle-15-SP4-Continuous-Migration-SLE15SP4-s390x-offline_sle12sp3_sles15sp3_sles15sp4_media_all_full_s390x_ph0@s390x-kvm-sle15" which previously failed. Let's see the effect.
Updated by mkittler over 2 years ago
- Status changed from Feedback to Resolved
@okurz Thanks - and it works now. (The job passed and it says "SLES-15-SP4-s390x-Build101.1-12SP3-15-SP3-ph0.qcow2 (23 GiB)" under assets.)