DEV Community

Automating video analysis to cut your streaming bandwidth usage in half

AWS Setup

Using an optimal bitrate ladder can make a big difference when it comes to the bandwidth usage of your video streams. Some videos may require a high bitrate to deliver a high quality stream while some may require much less. Comparing simple animated content to an action movie with a lot of smoke and moving shots, this difference can be quite large. In this article, we will show you how to automatically generate an optimal bitrate ladder that reduces bitrate usage by 46% compared with the standard ABR-ladder from Apple without any video quality losses.

But first of all, why is it important to not waste bandwidth? The obvious reason is that users on low quality internet connections will be able to get a better experience. However, for you as a streaming provider, you will also save money and energy usage, thus lowering your total CO2-emissions. Everybody wins!

Optimizing ABR-ladders can be a lot of manual labour, however. The method developed by Netflix involves encoding your source material in a variety of different bitrates for each resolution. Each variant also needs to be analyzed using a video quality metric, which also takes time and manual labour. To make this process easier, we set out to automate it as much as possible. All scripts and tools used in this blog post can be found in this GitHub-repository.

Step 1: Transcoding files to test

To determine the "optimal" bitrates, we need to create a variety videos at different bitrates at each resolution we wish to support. We created a script that can automatically generate bitrate-resolution pairs to evaluate and transcode these using AWS MediaConvert. Since we are transcoding many files, running this in the cloud is especially useful since we can do it in parallel. Running the transcoding process sequentially on your own machine will take many hours to complete, while it will only take a few minutes in the cloud.

We select 10 bitrates for each resolution, but since higher resolutions need higher bitrates, we need to pick bitrates depending on the pixel count. Using the following code, we can automatically generate the variants to analyze.

for resolution in resolutions:
    w = resolution[0]
    h = resolution[1]

    bitrate_floor = int((w*h)/2)
    bitrate_ceil = int((w*h)/0.1)
    bitrate_step = int((w*h)/1)

    for bitrate in range(bitrate_floor, bitrate_ceil, bitrate_step):
        # Start transcoding on AWS MediaConvert
Enter fullscreen mode Exit fullscreen mode

Step 2: Video quality analysis

We are looking to get the best possible video quality per bitrate, which means we need a method to measure video quality. Traditionally, this has been done using PSNR, but there are many flaws with this method. Therefore, Netflix has developed VMAF which is a video quality metric that accounts for human perception. VMAF takes a reference file and compares this to a distorted variant and gives you a value between 0 and 100 that describes how similar they are, the higher the better. VMAF has many interesting properties which makes it a very interesting method to use. For example, a VMAF score of 93 or higher means that the video is perceptually identical to the source material (source). This would suggest that it is unnecessary to provide variants which have a score above 93. Netflix has also said that a difference of 6 is a just noticeable difference in quality (source). Therefore, spending lots of extra bandwidth to achieve an increase of less than 6 may also be unnecessary.

We use a tool called easyVmaf, which makes VMAF-analysis very easy. However, just like with transcoding, this process takes a lot of time to run sequentially on your own machine. If you need to analyze hundreds of files, this will take a very long time. Running this in a container in the cloud means we can run all the processes at the same time in parallel. We built a simple Docker-container that runs easyVmaf on files in a S3-bucket. This allows us to run the container on ECS and spin up as many tasks as we need. This Dockerfile can be found in the GitHub-repository.

The script will wait until the transcoding in the first step is finished by watching the destination directory in S3, like shown below.

while len(variants) > 0:
    res = s3.list_objects_v2(Bucket=bucket_name, Prefix=directory)
    if "Contents" in res:
        objects_in_bucket = list(map(lambda o: str(o["Key"]), res["Contents"]))
    else:
        objects_in_bucket = []

    for variant in variants:
        object_name = variant.replace(bucket + "/", "") + ".mp4"
        if object_name in objects_in_bucket:
            # Start VMAF-analysis on variant and remove from list of variants

    if len(variants) > 0:
        time.sleep(5)
Enter fullscreen mode Exit fullscreen mode

Step 3: Finding optimal bitrates for each resolution

Using the script that we've discussed, we now have VMAF scores for 60 different variants of bitrate-resolution pairs. How do we know which of these are the best? Netflix describes this in detail in their blog, but to summarize the method, we should look at a plot of the values.

Plot of bitrate-resolution pairs

As you can see, there comes a point where each resolution starts falling off and the next resolution because more optimal at that bitrate. For example, looking at the below figure, we can clearly see that 1080p is the best resolution around 3000 kbit/s. Providing an option for 720p at this bitrate will only make for a lower quality video experience. At lower bitrates however, we can also see that 1080p starts plummeting in the other direction as well. At around 1500 kbit/s, 720p is better since 1080p has dramatically dropped off.

Close up of 3000 kbit/s

We can imagine a convex hull that covers the outside of the plotted values. Ideally, we want to select values as close to this hull as possible, but in practice it will be hard to test all possible bitrate-resolution pairs to find these points. The points on the convex hull are at Pareto efficiency, as they are in a equilibrium where you get the best video quality per bandwidth as well as the best bandwidth per video quality.

However, simply selecting the "optimal" points might not work in practice. Depending on the resolutions tested, the convex hull might result in bitrates that are very close to each other or very far apart. Therefore, some manual work needs to be done to determine the final ladder.

Step 4: Determining the final ladder

We set out some criteria that we use to determine if we have a reasonable ABR-ladder. These are:

  1. Each rung should be between 1.5x to 2x the bitrate of the previous rung.
  2. Each rung should have at least an increase of 6 in VMAF.
  3. Each rung should use the resolution with the best video quality for that bitrate.
  4. No rung should have a VMAF score above 93.

To achieve these points, we may need to have multiple rungs per resolution and select bitrates manually.

Example with Big Buck Bunny

The images above were generated from a clip of Big Buck Bunny. Following the above steps we've generated an ABR-ladder that looks like the following:

Resolution Bitrate VMAF
416x234 150 kbit/s 15.8
640x360 225 kbit/s 30.8
640x360 350 kbit/s 44.1
768x432 550 kbit/s 56.6
960x540 850 kbit/s 67.2
1280x720 1275 kbit/s 78.1
1280x720 2100 kbit/s 85.3
1920x1080 4100 kbit/s 93.5

As you can see, the bitrates and VMAF-scores are well spaced and we also manage to achieve a VMAF-score of 93 on the final rung. This means that the 1080p variant will have the same visual experience as the source material. Let's compare this to the example ABR-ladder the Apple provides in their HLS Authoring Specification.

Resolution Bitrate VMAF
416x234 145 kbit/s 14.7
640x360 365 kbit/s 46.3
768x432 730 kbit/s 63.3
768x432 1100 kbit/s 70.7
960x540 2000 kbit/s 81.7
1280x720 3000 kbit/s 88.7
1280x720 4500 kbit/s 91.3
1920x1080 6000 kbit/s 96.0
1920x1080 7800 kbit/s 97.1

This ABR-ladder violates many of the criteria that we set up. As mentioned, providing variants with a VMAF-score above 93 may be unnecessary, but in this ladder both 1080p-rungs are above 93. Generally, we can see that this ABR-ladder is very wasteful for this type of content.

Comparing the ladders, we can see a 46% decrease in bandwidth by using the optimized ladder when summarizing the average bitrates for each resolution. Since the highest quality rung provides a VMAF-score of 93, we are able to do this while achieving a identical visual experience as the source material. We've managed to cut the bandwidth usage in half without affecting video quality and therefore also cut costs and energy usage by almost half as well.

Tailoring the ABR-ladder to the content is evidently very effective in reducing bandwidth usage, but the process of finding the best ladder can be difficult and time consuming. The scripts we've built to automate this process makes it much easier and faster to do and could be integrated in your existing ingest process to automatically determine the best ABR-ladder without any human input. The very small cost of transcoding and computing VMAF on the cloud can be insignificant to the cost savings you can get from optimizing your ABR-ladder.

Eyevinn Technology is the European leading independent consultancy firm specializing in video technology and media distribution.

If you need assistance in the development and implementation of this, our team of video developers are happy to help out. If you have any questions or comments just drop us a line in the comments section to this post.

Discussion (0)