action #129955
closedcoordination #103938: [saga][epic] Scale up: Efficient handling of large storage on o3
openQA Infrastructure - coordination #68923: [epic] Use external videoencoder in production auto_review:"External encoder not accepting data"
Second attempt to try out AV1 video codec as potential new default as of 2023 size:M
Description
Motivation¶
According to
https://en.m.wikipedia.org/wiki/AV1#Software_implementations
AV1 seems to have good support by now in current decoders, browsers and encoders. We should explore use of the codec as potential new default to make efficient use of storage space considering encoding performance and time impact. We already tried two to three years ago. It's time to try again
Acceptance criteria¶
- AC1: AV1 is used as new default video encoder where supported in os-autoinst or we researched why AV1 is not suitable for our purposes
- AC1: Older products running os-autoinst potentially not supporting AV1 still run the old or an alternative as fallback
Suggestions¶
- Read what has been done in the predecessor ticket
- Research how AV1 can be supported for us, e.g. by using ffmpeg as external video encoder as supported by #67342
- Either replace the internal encoder or use an AV1 capable one as "external" default (and switch of internal in this case)
- Note that the internal encoder is also responsible for producing the PNG for the live view so it can not be replaced completely.
- Check if AV1 provides better performance for our needs than the old internal encoder
- Consider to also compare against its predecessor VP9
- Consider fallback handling for older products potentially not supporting AV1
- Also see suggestions in #68923
Updated by okurz over 1 year ago
- Copied from action #75256: Try out AV1 video codec as potential new default added
Updated by livdywan over 1 year ago
- Subject changed from Second attempt to try out AV1 video codec as potential new default as of 2023 to Second attempt to try out AV1 video codec as potential new default as of 2023 size:M
- Status changed from New to Workable
Same as the previous ticket. Evaluate the state of affairs.
Updated by okurz over 1 year ago
- Priority changed from Low to High
Actually it would be good to do this earlier even before we add the external video encoder to OSD, increasing prio
Updated by mkittler over 1 year ago
- Status changed from Workable to In Progress
- Assignee set to mkittler
Updated by mkittler over 1 year ago
I put a summary two years ago into comment #75256#note-6. This is an updated version.
I have done all encodings with the same source video for comparability. I picked a random openQA video from my disk to do all the test encodings. Only in the middle I've noticed that the video is actually 1920x1080 (must have been from some experimentations) but for consistency I kept using it. It should not make a big difference except that the encoding speeds are a bit lower than they would have normally been. Since it also depends a lot on the CPU this will be different between our various machines in production anyways.
- SVT-1
- the package is now in Factory (https://build.opensuse.org/package/show/openSUSE:Factory/SVT-AV1) but outdated and supposedly not updated very often
- available in ffmpeg-4 as provided by TW/packman (
--enable-libsvtav1
)- not available in Leap 15.4 and 15.5, I get
Unknown encoder 'libsvtav1'
despite--enable-encoder='…,libsvtav1,…
showing up in the banner.
- not available in Leap 15.4 and 15.5, I get
- installation guide: https://github.com/AOMediaCodec/SVT-AV1#svt-av1-ffmpeg-plugin-installation
- documentation: https://ffmpeg.org/ffmpeg-codecs.html#libsvtav1
- the speed is great with the version provided by TW (
libSvtAv1Enc1
package) viaffmpeg -i video.ogv -c:v libsvtav1 -crf 35 video-svtav1.mkv
- The speed was around 11.5x but it also used up to 7 CPU cores. That would presumably be around 1.64x with just one core.
- The Theora video shrunk from 19 MiB to 2.9 MiB and the quality was still acceptable.
- libaom
- available in ffmpeg-4 as provided by TW/packman
- not available in Leap 15.4 and 15.5, I get
Unknown encoder 'libaom_av1'
despite--enable-encoder='…,libaom_av1,…
showing up in the banner (same for justlibaom
).
- not available in Leap 15.4 and 15.5, I get
- documentation: https://trac.ffmpeg.org/wiki/Encode/AV1 and https://ffmpeg.org/ffmpeg-codecs.html#libaom_002dav1
- the speed is better than it was two years ago but still not ideal, at least with the version provided by TW via
ffmpeg -i video.ogv -c:v libaom-av1 -crf 35 -b:v 1500k -cpu-used 8 video-aom-av1.mkv
(-cpu-used 8
is already the setting where it is as fast as possible)- I get around 1.5x but the encoder is using around three CPU cores. Likely we cannot afford to spend that much CPU time in production.
- The Theora video shrunk from 19 MiB to 2.9 MiB and the quality was still acceptable.
- available in ffmpeg-4 as provided by TW/packman
- rav1e
- available in ffmpeg-4 as provided by TW/packman
- not available in Leap 15.4 and 15.5, I get
Unknown encoder 'librav1e'
despite--enable-encoder='…,librav1e,…
showing up in the banner.
- not available in Leap 15.4 and 15.5, I get
- documentation: https://ffmpeg.org/ffmpeg-codecs.html#librav1e
- the speed is bad, at least with the version provided by TW (
rav1e
package, only 2 patch releases behind upstream) viaffmpeg -i video.ogv -c:v librav1e -qp 128 -speed 10 video-rav1e.mkv
(-speed 10
seems to be the fastest)- I only get around 0.191x in the beginning but then it reached at least 0.485x. It was only using a single core, though.
- The video was 3.7 MiB. That is bigger compared to the other encoders and the quality is nevertheless worse. So not a good result in comparison.
- I've also tried
ffmpeg -i video.ogv -c:v librav1e -b:v 1500k -speed 10 video-rav1e-cbr.mkv
to see whether bitrate mode is any better. The encoding speed was even slightly lower. The file size was with 11 MiB bigger (as expected with that bitrate setting) but at least the quality was also on par with the other encoders.
- available in ffmpeg-4 as provided by TW/packman
- libgav1
- only decoder, so not really relevant here
- the package is now in Factory (https://build.opensuse.org/package/show/openSUSE:Factory/libgav1)
- not supported by ffmpeg at this point
- dav1d
- only decoder, so not really relevant here
- available in ffmpeg-4 as provided by TW/packman/Leap15.4
- documentation: https://ffmpeg.org/ffmpeg-codecs.html#libdav1d
For comparison:
- vp9
- The speed is not great as well, we currently use
ffmpeg -i video.ogv -c:v libvpx-vp9 -crf 35 -b:v 1500k -cpu-used 0 video-vp9.mkv
in production (-cpu-used 0
means it is as slow as possible)- Locally I only get 0.6x, I'm wondering how this can work in production. At least is is just using one core (unlike libaom which used at least 2 cores).
- The Theora video shrunk from 19 MiB to 2.7 MiB and the quality was still acceptable. So VP9 with
-cpu-used 0
is a bit more efficient than libaom with-cpu-used 8
.
- I tried with
-cpu-used 4
as well because-cpu-used 0
seems a little extreme¹. With that I get 3.06x but also 3.7 MiB. - I have also tried with
-cpu-used 3
,2
and1
and with that I get 3.09x/3.6MiB, 2.71x/3.6MiB, and 1.71x/3.3MiB respectively.
- The speed is not great as well, we currently use
- SVT-1 is fast and produces small files with acceptable quality. It is better than all other encoders although VP9 does quite well, too.
- VP9 only produced a better quality/size-ratio than SVT-1 only when using more CPU time. That would mean SVT-1 is better. However, I'm not sure whether I have actually picked comparable quality levels. So maybe one should not interpret too much into these figures and just assume SVT-1 and VP9 are very comparable for typical openQA footage.
- I could have tried to further tweak further settings. Maybe some e.g. rav1e would preform better with different settings. Considering SVT-1 and VP9 are ahead a lot I doubt it, though.
- Considering that AV1 is the future and the encoder has supposedly still more room for optimizations I suppose SVT-1 is the winner.
- Considering VP9 is the best we can do in Leap 15.4 and 15.5 without a custom ffmpeg build we will likely nevertheless keep using VP9 for the time being. Maybe with
-cpu-used 1
, though.
¹ When I recently enabled VP9 on all o3 workers I've just copied the settings from workers where it has already been enabled. So I'm not sure why we use -cpu-used 0
in production.
Updated by okurz over 1 year ago
cool.
So what are your plans?
mkittler wrote:
- SVT-1
- available in ffmpeg-4 as provided by TW/packman/Leap15.4 (
--enable-libsvtav1
)- installation guide: https://github.com/AOMediaCodec/SVT-AV1#svt-av1-ffmpeg-plugin-installation
is that something you could test and compare against libaom?
- libaom
- available in ffmpeg-4 as provided by TW/packman/Leap15.4
- documentation: https://trac.ffmpeg.org/wiki/Encode/AV1 and https://ffmpeg.org/ffmpeg-codecs.html#libaom_002dav1
- the speed is too bad, at least with the version provided by TW via
ffmpeg -i video.ogv -c:v libaom-av1 -crf 35 -b:v 1500k -cpu-used 8 video-aom-av1.mkv
(-cpu-used 8
is already the setting where it is as fast as possible)
- I get around
1.5x
but the encoder is hammering the CPU. Likely we cannot afford to spend that much CPU time in production.
It is likely expectable that libaom is not as quick as SVT-1 or others but do you have any idea why it is that bad?
- rav1e
- available in ffmpeg-4 as provided by TW/packman/Leap15.4
- documentation: https://ffmpeg.org/ffmpeg-codecs.html#librav1e
plans to test it then?
EDIT: I tested multiple different variants and also container images. The results are very comparable. This means at least that we are ok to use ffmpeg from openSUSE. There is no benefit from other OS bases. I also tried podman run --pull=newer --rm -it --privileged -v "$(pwd):/videos" docker.io/masterofzen/av1an:latest -i video.ogv
which crashed my computer so I don't have the detailed notes anymore. But then finally I found one thing that was impressive: podman run --pull=newer --rm -it -v "$PWD:/data" ghcr.io/tamara-schmitz/ffmpeg-docker-container -i /data/video.ogv -c:v libsvtav1 -preset 10 -crf 35 -c:a copy /data/video-svtav1-preset10_crf35.mkv
-> frame= 1478 fps= 98 q=35.0 Lsize= 4777kB time=00:01:33.41 bitrate= 418.9kbits/s speed=6.18x, video:4769kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.157033% which is using ffmpeg 5.1.3 in a Tumbleweed container with SVT-AV1 encoder lib 1.4.1
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
I guess these findings are good enough to talk about it together on the next occasion.
@okurz I have frequently updated the comment you're quoting from. I suppose all your questions should be answered in the most recent version. Unfortunately this means that the Leap 15.4 support is not like I initially stated (the ffmpeg banner was misleading).
Updated by okurz over 1 year ago
mkittler wrote:
I guess these findings are good enough to talk about it together on the next occasion.
@okurz I have frequently updated the comment you're quoting from. I suppose all your questions should be answered in the most recent version. Unfortunately this means that the Leap 15.4 support is not like I initially stated (the ffmpeg banner was misleading).
Yes, thank you. Very thorough report. So I suggest we do the following:
- Suggest in the openQA documentation that SVT-1 can be used with an example command line that would work on Tumbleweed when put into the external video encoder setting
- Change o3 production settings to
cpu-used X
with X in the range [1:8] - Change one production o3 or osd worker to encode to AV1 with SVT-1 and a podman command line, e.g. based on what I wrote in #129955-7
Updated by mkittler over 1 year ago
- Status changed from Feedback to In Progress
PR for 1. https://github.com/os-autoinst/os-autoinst/pull/2326
For 2. I have invoked for i in aarch64 openqaworker4 openqaworker7 qa-power8-3 rebel; do echo $i && ssh root@$i "sed -i -e 's|-cpu-used 0|-cpu-used 1|g' /etc/openqa/workers.ini" ; done
on ariel. I've excluded openqaworker19 and openqaworker20 because they are very powerful and not maxing out their CPU.
I'm going to try out SVT-1 on a production machine tomorrow.
Updated by openqa_review over 1 year ago
- Due date set to 2023-06-28
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
Unfortunately I see no way to make a containerized ffmpeg invocation work without modifying os-autoinst first to avoid using an absolute path. So I've created https://github.com/os-autoinst/os-autoinst/pull/2327 for that. It works locally and I'll try it in production once it has been deployed.
Updated by okurz over 1 year ago
https://github.com/os-autoinst/os-autoinst/pull/2327 is merged and most certainly deployed. Would be great to see some AV1 video on o3 until tomorrow's workshop that is about the topic "More efficient video encoder used on o3 - how to work with videos" :)
Updated by mkittler over 1 year ago
Ok, then I'll pick one of new new overpowered o3 workers for this.
Updated by mkittler over 1 year ago
I've just created a test job on a special worker slot on the o3 worker openqaworker19: https://openqa.opensuse.org/tests/3358984 - To make this work I installed podman, configured user/group IDs so _openqa-worker can use it and configured the command from https://github.com/os-autoinst/os-autoinst/pull/2327.
If it works I'll enable SVT-1 on openqaworker19 for all regular slots. So far it looks good:
ffprobe /var/lib/openqa/pool/200/video.webm
ffprobe version 4.4 Copyright (c) 2007-2021 the FFmpeg developers
…
[libdav1d @ 0x55dd49e6c7c0] libdav1d 0.9.2
Input #0, matroska,webm, from '/var/lib/openqa/pool/200/video.webm':
Metadata:
ENCODER : Lavf60.3.100
Duration: 00:00:34.63, start: 0.000000, bitrate: 772 kb/s
Stream #0:0: Video: av1 (Main), yuv420p(tv, progressive), 1024x768 [SAR 1:1 DAR 4:3], 24 fps, 24 tbr, 1k tbn, 1k tbc
Metadata:
ENCODER : Lavc60.3.100 libsvtav1
DURATION : 00:00:34.625000000
[libdav1d @ 0x55dd49e71780] libdav1d 0.9.2
Updated by okurz over 1 year ago
Great to see this working in production. For comparison. For now AV1 is bigger than VP9:
- AV1: https://openqa.opensuse.org/tests/3358984/video?filename=video.webm 3.34MB 772 kbps
- VP9: https://openqa.opensuse.org/tests/3358974/video?filename=video.webm 1.54MB 348 kbps
Updated by mkittler over 1 year ago
So this generally works. However, it relied on giving every user access to /var/lib/empty/
which is nothing we can do as it prevents login via ssh. Without this access we're running into the following pocman error: Error: creating runtime static files directory: mkdir /var/lib/empty/.local: permission denied
It is expected that SVT-1 is less efficient compared to the VP9 encoding which has been done with -cpu-used 0
. However, I wouldn't have though it makes such a big difference. So I'll refrain from enabling this everywhere before I do some further local testing. (The tests I've conducted so far are 2.7 MiB from VP9 vs. 2.9 MiB from SVT-1 for the same video. Likely the -preset 10
option I picked up from #129955#note-7 made things worse.)
Updated by mkittler over 1 year ago
The podman setup was fixed after just giving _openqa-worker
a proper home directory. So I basically just did mkdir /var/lib/openqa/worker && chown _openqa-worker:users /var/lib/openqa/worker
and changed /etc/passwd
accordingly.
I've also just removed -preset 10
so and started another test job: https://openqa.opensuse.org/tests/3359054
Let's see how big the video will be now.
EDIT: It reduced the file size from 3.34 MiB to 3.01 MiB. This is still bigger so I'm trying -crf 45
now: https://openqa.opensuse.org/tests/3359063
EDIT: With -crf 45
we're at 2.7 MiB. Maybe it makes sense to increase the CRF further but I'll have to compare the quality first.
EDIT: I think reducing the quality to -crf 50
is still acceptable. Decreasing it further would definitely be worse then the VP9 encoding we're comparing with (and I guess also 50 is already worse). With 50 we still get 2.3 MiB by default. By using a lower preset one can make it more efficient. I think the lowest preset we can use in production is 6, otherwise it gets quite slow. With that we're comparable¹ to VP9:
1,7M video-2-svt50-preset-6.mkv
1,8M video-2-svt50-preset-7.mkv
2,1M video-2-svt50-preset-8.mkv
2,3M video-2-svt50-preset-default.mkv
3,1M video-2-vp9-crf35-cpu-used-1.mkv
It is hard to compare because the CRF parameter's scale is not identical so the videos have slightly different qualify. From a brief comparison SVT's 50 is somewhere between VP9's 35 and 45. The speed is also hard to compare because it is also relevant how many CPU cores were kept busy (and SVT-1 seems to be faster but at the cost of utilizing more cores).
Overall I think -c:v libsvtav1 -crf 50 -preset 6
would be good parameters for SVT-1 (I've configured that on openqaworker19 now for all regular slots¹), maybe also -preset 7
for slower workers.
¹ If you see any problems with that, feel free to activate the VP9 encoder config in /etc/openqa/workers.ini
on that worker again. The worker services are supposed to restart automatically after editing the config.
Updated by mkittler over 1 year ago
It still looks good on o3, e.g. https://openqa.opensuse.org/tests/3361186/video?filename=video.webm and https://openqa.opensuse.org/tests/3361570/video?filename=video.webm - bitrate and quality are comparable with VP9 encodings.
As discussed, isotovideo should probe by default whether ffmpeg is installed and what it can do. It would then use SVT-1 (or VP9 as fallback) with reasonable default parameters. Only if none is possible the normal video encoder would be used. PR for this: https://github.com/os-autoinst/os-autoinst/pull/2328
Updated by dimstar over 1 year ago
OW19 workers currently fail all jobs with
Reason: backend died: External encoder not accepting data: Broken pipe at /usr/lib/os-autoinst/backend/baseclass.pm line 141.
In the logs, I can find
Launching external video encoder: podman run --workdir /pool --pull=newer --rm -i -v .:/pool ghcr.io/tamara-schmitz/ffmpeg-docker-container-free -hide_banner -nostats -r 24 -f image2pipe -vcodec ppm -i - -pix_fmt yuv420p -c:v libsvtav1 -crf 50 -preset 7 'video.webm'
Error: opening database /var/lib/openqa/worker/.local/share/containers/storage/libpod/bolt_state.db: open /var/lib/openqa/worker/.local/share/containers/storage/libpod/bolt_state.db: permission denied
The access denied seems to be a consequence of AppArmor blocking it:
type=AVC msg=audit(1687185372.888:3644): apparmor="DENIED" operation="open" profile="/usr/share/openqa/script/worker" name="/var/lib/openqa/worker/.local/share/containers/storage/libpod/bolt_state.db" pid=14267 comm="podman" requested_mask="wrc" denied_mask="wrc" fsuid=103 ouid=103
Updated by mkittler over 1 year ago
I have enabled VP9 again on openqaworker19.
I don't think it is worth making podman/AppArmor work together. As we've seen issues might not be immediately obvious and we'd likely only be able to find them one-by-one when running this over a longer period of time.
Updated by okurz over 1 year ago
Still waiting for second approval of https://github.com/os-autoinst/os-autoinst/pull/2328
Updated by okurz over 1 year ago
https://github.com/os-autoinst/os-autoinst/pull/2328 merged meaning that all openQA workers using that version of os-autoinst should by default try AV1 and fallback to VP9. So far all o3 machines have the external video encoder set so the setting overrides the internal defaulting. And OSD workers failed deployment today so the new version is not yet installed. I suggest to await the deployment on OSD and wait a night to see if there is an immediate regression.
Updated by mkittler over 1 year ago
Done:
martchus@ariel:~> for i in aarch64 openqaworker4 openqaworker7 qa-power8-3 rebel; do echo $i && ssh root@$i "sed -i -e 's|^EXTERNAL_VIDEO_ENCODER_CMD=|#EXTERNAL_VIDEO_ENCODER_CMD=|g' /etc/openqa/workers.ini" ; done
aarch64
openqaworker4
openqaworker7
qa-power8-3
rebel
martchus@ariel:~> for i in aarch64 openqaworker4 openqaworker7 qa-power8-3 rebel; do echo $i && ssh root@$i "grep -i external /etc/openqa/workers.ini" ; done
aarch64
#EXTERNAL_VIDEO_ENCODER_CMD=ffmpeg -y -hide_banner -nostats -r 24 -f image2pipe -vcodec ppm -i - -pix_fmt yuv420p -c:v libvpx-vp9 -crf 35 -b:v 1500k -cpu-used 1
EXTERNAL_VIDEO_ENCODER_OUTPUT_FILE_EXTENSION=webm
openqaworker4
#EXTERNAL_VIDEO_ENCODER_CMD=ffmpeg -y -hide_banner -nostats -r 24 -f image2pipe -vcodec ppm -i - -pix_fmt yuv420p -c:v libvpx-vp9 -crf 35 -b:v 1500k -cpu-used 1
EXTERNAL_VIDEO_ENCODER_OUTPUT_FILE_EXTENSION=webm
openqaworker7
#EXTERNAL_VIDEO_ENCODER_CMD=ffmpeg -y -hide_banner -nostats -r 24 -f image2pipe -vcodec ppm -i - -pix_fmt yuv420p -c:v libvpx-vp9 -crf 35 -b:v 1500k -cpu-used 1
EXTERNAL_VIDEO_ENCODER_OUTPUT_FILE_EXTENSION=webm
qa-power8-3
#EXTERNAL_VIDEO_ENCODER_CMD=ffmpeg -y -hide_banner -nostats -r 24 -f image2pipe -vcodec ppm -i - -pix_fmt yuv420p -c:v libvpx-vp9 -crf 35 -b:v 1500k -cpu-used 1
EXTERNAL_VIDEO_ENCODER_OUTPUT_FILE_EXTENSION=webm
rebel
#EXTERNAL_VIDEO_ENCODER_CMD=ffmpeg -y -hide_banner -nostats -r 24 -f image2pipe -vcodec ppm -i - -pix_fmt yuv420p -c:v libvpx-vp9 -crf 35 -b:v 1500k -cpu-used 1
EXTERNAL_VIDEO_ENCODER_OUTPUT_FILE_EXTENSION=webm
I kept the line on worker19/20 as those are powerful enough for -cpu-used 0
. I added an according remark.
I'll have a look tomorrow to check a few jobs.
Updated by okurz over 1 year ago
Please ensure that worker instances on worker19 are online again before closing the ticket
Updated by mkittler over 1 year ago
Strange, looks like someone disabled them. I'll enable them again.
Updated by mkittler over 1 year ago
- Status changed from Feedback to Resolved
I enabled all slots again. It looks good (e.g. https://openqa.opensuse.org/tests/3375665#downloads and https://openqa.opensuse.org/tests/3375894#downloads) so I'm considering this ticket resolved.
Updated by okurz 4 months ago
- Related to action #163601: Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters - incompatible ffmpeg size:S added