action #108281
test fails in svirt_upload_assets - can not upload qcow, error "File is too big"
0%
Description
Observation¶
openQA test in scenario sle-15-SP4-Continuous-Migration-SLE15SP4-s390x-offline_sle12sp3_sles15sp3_sles15sp4_media_all_full_s390x_ph0@s390x-kvm-sle15 fails in
svirt_upload_assets
Test suite description¶
SLE12SP3 -> SLES15SP3 - SLES15SP4
No idea why "File is too big" msg show when upload qcow.
Reproducible¶
Fails since (at least) Build 101.1
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
History
#1
Updated by mkittler about 1 year ago
How big is the file? It must not be bigger than UPLOAD_MAX_MESSAGE_SIZE_GB
which is by default 20 GiB.
#2
Updated by okurz about 1 year ago
- Due date set to 2022-03-29
- Category set to Support
- Status changed from New to Feedback
- Assignee set to mkittler
- Priority changed from Normal to Low
- Target version set to Ready
#3
Updated by coolgw about 1 year ago
mkittler no idea how big it is, maybe already big then default value, so my understanding i can set UPLOAD_MAX_MESSAGE_SIZE_GB = 30 for a clone job?
#4
Updated by mkittler about 1 year ago
Yes, setting UPLOAD_MAX_MESSAGE_SIZE_GB
to a bigger value (e.g. via `openqa-clone-job´) than the actual file size should work. Considering the default is already quite big I'm nevertheless wondering whether it would make more sense trying to keep the file smaller.
#5
Updated by coolgw about 1 year ago
base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.
#6
Updated by mkittler about 1 year ago
This should have been working - the value is also in vars.json and I don't see a local override via MOJO_MAX_MESSAGE_SIZE.
Btw, that's the code responsible for setting the limit: https://github.com/os-autoinst/os-autoinst/blob/9baeb0bfe086b40dec131d78a5f5b29778fac8c5/commands.pm#L228
#7
Updated by okurz about 1 year ago
coolgw wrote:
base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.
I only saw that the job is incomplete, where did you see the message about "too big"?
#8
Updated by mkittler about 1 year ago
I only saw that the job is incomplete, where did you see the message about "too big"?
https://openqa.nue.suse.com/tests/8322675#step/svirt_upload_assets/9
PR for removing the limit completely and printing a more specific error message: https://github.com/os-autoinst/os-autoinst/pull/1998
#9
Updated by coolgw about 1 year ago
okurz wrote:
coolgw wrote:
base http://openqa.suse.de/tests/8349502#live result, it seems UPLOAD_MAX_MESSAGE_SIZE_GB can not take effect, i set it to 50, but qcow size is 24 but still see "too big" msg.
I only saw that the job is incomplete, where did you see the message about "too big"?
the above job not capture the last screen since incomplete.
see this job.
https://openqa.suse.de/tests/8338790#
BTW: when this PR will implement in osd so i can have a try?
#10
Updated by mkittler about 1 year ago
It has been deployed as of 24.03.22 06:28 CET.
#11
Updated by cdywan about 1 year ago
I wonder if this should be blocking on #109046 since the tests can't be verified?
#12
Updated by okurz about 1 year ago
- Due date deleted (
2022-03-29) - Status changed from Feedback to Resolved
cdywan wrote:
I wonder if this should be blocking on #109046 since the tests can't be verified?
I think this is not necessary. As decided in daily the change was introduced and deployed and we at least ensured that no related regressions had been introduced so far. As there was no feedback so far and due date is reached I suggest we resolve here and call it done. In case any problems are encountered please reopen.
#13
Updated by coolgw about 1 year ago
base following case result, the issue still happen, since i saw the PR set default value to infinity so i SKIP the setting of UPLOAD_MAX_MESSAGE_SIZE_GB, but error msg "Maximum message size exceeded"
https://openqa.suse.de/tests/8434449#live
#14
Updated by mkittler about 1 year ago
- Status changed from Resolved to In Progress
I'll try reproducing this with unit tests (using a very big file).
#15
Updated by mkittler about 1 year ago
It works for me locally via:
truncate -s 30G t/data/big-file curl --progress-bar --form upload=@t/data/big-file --form target=assets_public http://localhost:49019/Hallo/upload_asset/foo OK: foo
On http://localhost:49019 a test command server is running.
Note that temporary files end up (by default) under /tmp
so this directory needs to be big enough. Of course the pool directory needs to be big enough as well. Maybe it makes most sense to have the temporary directory somewhere within the pool directory so the final move is just a move within the same file system and we don't risk using too much RAM (since /tmp
is often tmpfs).
#16
Updated by mkittler about 1 year ago
If I give it a too small temp dir¹ the command server gets just stuck writing the file and shows increasing memory usage (up to 2 GiB) until it finally resets the connection:
curl --progress-bar --form upload=@t/data/big-file --form target=assets_public http://localhost:50263/Hallo/upload_asset/foo curl: (56) Recv failure: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt
Far from ideal but not really the issue I wanted to reproduce.
By the way, when giving the command server a MOJO_TMPDIR it cannot write to at all (e.g. I've tried /dev/full
and a directory where it has no permissions) then it seems to stick to an in-memory asset. At least the memory usage increased so fast on my system that I had to stop the processes before running out of memory.
¹ e.g.
dd if=/dev/zero of=tmp-1gib bs=1M count=1024 /usr/sbin/mkfs.ext4 tmp-1gib mkdir tmp-1gib.d sudo mount -o loop tmp-1gib tmp-1gib.d sudo chown martchus:users tmp-1gib.d export MOJO_TMPDIR=tmp-1gib.d
#17
Updated by mkittler about 1 year ago
A PR to avoid relying on /tmp
being big enough: https://github.com/os-autoinst/os-autoinst/pull/2009
But judging from my experiments it won't help with this issue (which I still couldn't reproduce).
#18
Updated by mkittler about 1 year ago
- Status changed from In Progress to Feedback
Apparently we set MOJO_MAX_MESSAGE_SIZE
in the start script of the worker. I haven't expected that but that's an easy explanation why setting UPLOAD_MAX_MESSAGE_SIZE_GB
and MOJO_MAX_MESSAGE_SIZE
didn't help. So the following PR should fix the issue: https://github.com/os-autoinst/openQA/pull/4586
#19
Updated by kraih about 1 year ago
Tried to replicate the issue with a small Mojolicious unit test, but have been unable to unfortunately. The underlying Mojo::Asset::*
modules throw an exception. https://github.com/mojolicious/mojo/commit/3ea094e3073f8e4d72adbc0d47fb3505b96e90a9
#20
Updated by mkittler about 1 year ago
The mentioned PR has been merged and is expected to be deployed on Monday morning (CEST).
#21
Updated by kraih about 1 year ago
I did manage to fix the memory leak in Mojolicious though. So memory usage should not be able to exceed the limit anymore. https://github.com/mojolicious/mojo/commit/ba9d3c2bceb9b1524e1ce07994b33da5ea9b8ce0
#22
Updated by okurz about 1 year ago
https://github.com/os-autoinst/openQA/pull/4586 is deployed on OSD. I triggered https://openqa.suse.de/tests/8463395# in the scenario "sle-15-SP4-Continuous-Migration-SLE15SP4-s390x-offline_sle12sp3_sles15sp3_sles15sp4_media_all_full_s390x_ph0@s390x-kvm-sle15" which previously failed. Let's see the effect.
#23
Updated by mkittler about 1 year ago
- Status changed from Feedback to Resolved
okurz Thanks - and it works now. (The job passed and it says "SLES-15-SP4-s390x-Build101.1-12SP3-15-SP3-ph0.qcow2 (23 GiB)" under assets.)