This post assumes you have a basic understanding of how encoding a video works, and what the popular codecs are. There are many great guides on this if you want to learn about it in detail. In short, CRF compresses a video based on visual quality, rather than targeting a specific bitrate/file size. x264 is an older incredibly widely supported codec for viewing video. x265 is the newer version of x264, and is supported by any modern video player. AV1 is a codec that was announced in 2015, and had commercial hardware support released to the public in June 2022. It is the new hotness to x264’s old reliable.
I recently came across the desire to compare various compression levels for the two most popular video codecs to the most popular up and coming one. I’m a simple guy, there’s a couple of good research papers for this specific topic, so my test setup is not going be rigorous. At best, it will be representative. I wanted to see just what a difference a codec can make on a video file size using comparable quality levels. On to the setup!
The source file is a 4k x264 I had laying around with a runtime of 1 hour and 33 minutes and a bitrate of 95.5mbp. I mostly chose it due to the fact that it was filmed, and that it was similar enough to a 4k security camera output that i could reasonably expect similar or smaller (since security cameras have a lot less dynamic motion) bitrates for security footage archival purposes. If the results work out well, I may end up grabbing an arc video card, so I am not worried about encode speed for these tests.
The source video file is being converted from from 4k to 1080p. The goal isn’t to see how small I can get the file. The goal is to see how the three results compare to each other. To that end, I will also be excluding audio and subtitles from the conversion.
This website says that x264 is considered visually lossless at 18 crf, and x265 is visually lossless at 25. The ffmpeg page for av1 says that a crf of 23 matches the visual quality of x264 at 19. Since i’m looking for a quick and dirty comparison, this is good enough, especially since various sources disagree on what crf equals what quality level anyways.
Here’s my x264 ffmpeg command:
ffmpeg.exe -i source.2160p.mkv
-map 0:v:0
-map -0:a -map -0:s -map_metadata -1
-c:v libx264
-preset slower
-vf scale=w=1920:-2
-crf 18
dest.1080p.x264.mkv
Here’s my x265 ffmpeg command:
ffmpeg -i source.2160p.mkv
-map 0:v:0
-map -0:a -map -0:s -map_metadata -1
-c:v libx265
-preset slower
-vf scale=w=1920:-2
-crf 25
dest.1080p.x265.mkv
And finally, here’s my av1 command:
ffmpeg -i source.2160p.mkv
-map 0:v:0
-map -0:a -map -0:s -map_metadata -1
-c:v libsvtav1
-preset 3
-vf scale=w=1920:-2
-crf 23
dest.1080p.av1.mkv
On to the results! after an overnight encode session, or two, I ended up with the following sizes;
* the x264 encode ended up being 10.8 gigabytes with an average bitrate of 16,666kbps.
* the x265 encode ended up being 1.08 gigabytes with an average bitrate of 1,656kbps.
* the av2 encode ended up being 1.6 gigabytes with an average bitrate of 2,458kbps.
Hold on, the av1 codec is supposed to be more efficient than the other two. What gives? A relatively long check with FFMetrics later, and we have results. Here’s a great article explaining the numbers further, but in short, they check various metrics against the source material to try and get a number that matches the visual fidelity loss that encoding a video gives.
PSNR does a little math to see how much has changed from the source and approximates the human perception of reconstruction quality. The score is typically between 30 and 50, with the higher number being better.
SSIM compares brightness, contrast and structure to the original source. It’s values are between 0 and 1, with 1 being the most similar to the source video.
VMAF attempts to predicts subjective video quality based on a reference and a distorted video sequence. It is a newer algorithm developed by Netflix with the University of Southern California, University of Nantes IPI/LS2N lab, and The University of Texas at Austin Laboratory for Image and Video Engineering (LIVE). The values for it are also considered to be better the higher the number is.
All of this to say, my crf values were off, by a technically noticeable amount and I need to redo my work. The av1 file is considered best quality by all the digital algorithms I have access to, and needs to be reduced in order to match the other two codecs. But this brings me to a new problem. What is a good target range for each of these values to be considered “visually lossless”? Instead of trusting that the numbers are right, as they clearly aren’t entirely, I need to first define what my levels of bad, good enough, and visually lossless actually are. I also need to learn what a badly compressed video looks like so I can detect the less obvious but still there compression artifacts. Finally, I need to choose a new source video, because dang, these took way to long to encode on my poor computer.
Fair warning: Just like kerning, once you learn what bad encodes look like, you lose your ability to watch what most people consider to be good enough encodes without hating the video quality. If you enjoy youtube or netflix a lot, and don’t mind their compression, you are a prime candidate for getting mad at their encoding choices if you continue reading.
So, first, my new video is the 2012 blender open movie project, Tears of Steel. I grabbed the 4k source file, which is 12 minutes, 14 seconds long, 3840 pixels wide, and has an average bitrate of 73,227kbps. Instead of using the promised visually lossless crf values this time, I’m going to use the worst possible crf value to get an idea of what a badly compressed version of this video looks like. My new crf values will be 63, 51, and 51, for av1, x265, and x264 respectively.
The resulting file sizes were actually pretty amazing. The x264 file was 15.7MB, the x265 file was 8.38MB and the av1 file was 16.1MB. Essentially, the file size dropped to practically nothing for all of them. However, the x264 and x265 files were both unwatchable. You would have gained more edetail from a lossless 240p encode than what you got from these encodes. The real star was the av1 encode. Yes, it had a lot of blur around the edges of movement, and basically all of the details were removed and smeared into gradients of color, but the video itself was still very much understandable. Because the videos are so small, I feel comfortable sharing the results here without using a third party video host.
To pixel peep picture quality, I ran the following command on all the videos;
ffmpeg -i source.video -vf fps=1/60 codec%04d.png
The end results was twelve images from each video, demonstrating the various compression methods used. x264 turns everything into blocks, particularly around movement. x265 turns everything into blocks and triangles, also particularly around movement. av1 smooths out colors to cartoon levels, and also turns movements into blocks, but to a much lesser extent than the other two codecs.
In short, av1 does a very good job of compressing the video using methods that are better at hiding the compression to the human eye than the old x264/5 methods. Now that we know what extreme compression looks like, my next goal is to decide on a crf level that does a ‘good enough’ job of hiding the compression. For me, that means if I pixel peep i’ll find compression artifacts, but by and large, they won’t be obvious when just watching the video. Considering how well av1 has done so far, I won’t be surprised if its crf is a lot higher than the other two when i stop trying for better quality. I’ll encode by steps of 3 crf values, and try and pick the value that makes the most subjective sense to me.
My choices were as follows:
crf | av1 KB | x265 KB | x264 KB |
18 | 419,261 | 632,079 | 685,217 – visually lossless |
21 | 352,337 | 390,358 – visually lossless | 411,439 |
24 | 301,517 | 250,426 | 263,524 – good enough |
27 | 245,685 | 165,079 – good enough | 176,919 |
30 | 205,008 | 110,062 | 122,458 |
33 | 168,192 | 73,528 | 86,899 |
36 | 139,379 – visually lossless | 48,516 | 63,214 |
39 | 116,096 | 31,670 | 47,161 |
42 | 97,365 – good enough | 20,636 | 35,801 |
45 | 81,805 | 13,598 | 27,484 |
48 | 69,044 | 9,726 | 20,823 |
51 | 58,316 | 8,586 – worst possible | 16,120 – worst possible |
54 | 48,681 | ||
57 | 39,113 | ||
60 | 29,062 | ||
63 | 16,533 – worst possible |
av1 is tricky because it’s compression adds border pixellation, but it also slowly removes details from object. At it’s worst, this turns people into clay, and the borders into a janky mess. However, at more reasonable levels it’s actually pretty hard to notice the clay conversion effect. I personally noted compression artifacts while watching the video for av1 at a crf of 45, so I consider an av1 crf of 42 to be good enough for things like security camera footage, my grandmother’s house, or my tablet. Not good for archival, but damn good for less important things. The good enough av1 file size was 97Mb.
The x264 and x265 compression artifacts are a lot easier to find compared to av1. Check the borders of moving objects for pixellation. If it’s there, you can see the artifacts. I was unable to notice artifacts for x264 without pixel peeping at a crf level of 24. The file size for the video at that level was 263MB.
I was unable to notice artifacts for x265 without pixel peeping at the crf level of 27. The file size for the video at that level was 165MB.
Here is a slideshow of screenshots from my ‘good enough’ choices. You may notice compression in them, but with my eyes, while just playing the video, it wasn’t really something I could pick out easily, if at all.
So then, what do I consider to be visually lossless? From the results I got, I feel safe in agreeing with ffmpeg’s documentation regarding x264’s crf choice of 18 (685MB). You’ll probably want to go lower than the recommended 25 with x265, by at least three though. I couldn’t see any artifacts playing the crf 27 x265 video, but 25 is too close to that for me to confidently say it is visually lossless. You’re probably better off going with crf 21 (390MB) or 22 to consider it visually lossless.
For av1, I feel confident in saying that, even if you pixel peep, and compare it to the original frame by frame, you won’t find compression artifacts at the crf level of 36 which was just 139MB in size. Here’s some screenshots to compare the original to my visually lossless choices. Feel free to check my conclusions yourself.
Don’t feel like trusting my eyes for it? Let’s see what FFMetrics says about my choices;
That’s a lot better than my first attempt. However, just to be certain, I’m going to drop the CRF level until it reaches parity with the VAMF test.
Dang, that’s a much different crf level than my novice eyes are able to see. Maybe a better comparison method is needed? I recently found out about imgsli, which is a better image comparison tool than what ive been doing up until now. Let’s keep it simple, my visually lossless choice vs VAMF’s visually lossless number.
crf | av1 KB | x265 KB | x264 KB |
18 | 419,261 | 632,079 | 685,217 – visually lossless |
21 | 352,337 | 390,358 – visually lossless | 411,439 |
24 | 301,517 – VAMF visually lossless | 250,426 | 263,524 – good enough |
27 | 245,685 | 165,079 – good enough | 176,919 |
30 | 205,008 | 110,062 | 122,458 |
33 | 168,192 | 73,528 | 86,899 |
36 | 139,379 – My visually lossless | 48,516 | 63,214 |
39 | 116,096 | 31,670 | 47,161 |
42 | 97,365 – good enough | 20,636 | 35,801 |
45 | 81,805 | 13,598 | 27,484 |
48 | 69,044 | 9,726 | 20,823 |
51 | 58,316 | 8,586 – worst possible | 16,120 – worst possible |
54 | 48,681 | ||
57 | 39,113 | ||
60 | 29,062 | ||
63 | 16,533 – worst possible |
After comparing them, I still stand by my choice. If you pixel peep the crf 36 and compare it to the the crf 24 results, you can see extremely minor differences in her jacket, but overall, nothing of value is actually lost in translation.
I had a lot of fun playing around with these comparisons. Hopefully, you found it interesting too.