Create account

DEV Community

Amal Shaji

Posted on Jan 15, 2021 • Originally published at blog.amalshaji.com

Build a Reddit Video downloader with Python

#python

This article is about building a Reddit video downloader from scratch using python. Our downloader will be able to downloader video along with the audio.

Reddit makes it tough to download posted videos directly. This is because Reddit saves the audio and video separately on their servers. But this is not a big problem, as we will download both audio and video separately and join them using FFmpeg.

How to get the video URL?

Every Reddit post can be parsed as a JSON, simply by appending a .json to the URL. Thus,

https://www.reddit.com/r/PublicFreakout/comments/kwif5d/insurrectionist_gets_arrested_trying_to_break/

becomes

https://www.reddit.com/r/PublicFreakout/comments/kwif5d/insurrectionist_gets_arrested_trying_to_break.json

Go to the JSON URL, and you should see something like,

From there, it is easy. Press Ctrl + F and search for .mp4.

That's what we need, the fallback URL. Now have to figure out how to reach here starting from the top.

If you are trying to experiment with this on your own, make sure you either download the JSON or try using jupyter or colab. The Reddit rate-limiting feature won't allow you to fetch the JSON data frequently.

First, get the JSON data using requests.

data = requests.get(post_url).json()

where post_url is the JSON URL.

Obtain the URL using,

url = data[0]["data"]["children"][0]["data"]["secure_media"]["reddit_video"][
        "fallback_url"
    ]

This is what a typical video URL looks like,

https://v.redd.it/jvjih3f894b61/DASH_1080.mp4?source=fallback

Not the best approach. Here we didn't even check if the keys existed. We'll assume this is the standard format for all posts.

What about the audio URL?

The audio URL is not available in the JSON data. But it can be obtained using a simple hack.

As mentioned above, this is the video URL. Go to the URL and check if it's video-only.

https://v.redd.it/jvjih3f894b61/DASH_1080.mp4?source=fallback

Here the DASH_1080.mp4 is the filename, where 1080 is the resolution. Replace the resolution with audio, and you get the audio URL.

Thus the audio URL corresponding to our URL is
https://v.redd.it/jvjih3f894b61/DASH_audio.mp4?source=fallback

Now that you have both audio and video, download them in a pythonic way.

import urllib.request

urllib.request.urlretrieve(
            url,
            filename=output_path,
        )

If you are a big fan of progress bars, add one using the reporthook of urlretrieve.

from tqdm import tqdm

class DownloadProgressBar(tqdm):
    def update_to(self, b: int = 1, bsize: int = 1024, tsize: int = None):
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)

with DownloadProgressBar(unit="B", unit_scale=True, miniters=1) as d:
        urllib.request.urlretrieve(
            url,
            filename=output_path,
            reporthook=d.update_to,
        )

Let's merge 'em

Once the files are downloaded to audio.mp4 and video.mp4, let's merge them using FFMpeg.

FYI, you can have any name for audio and video output files.

If you don't have FFmpeg installed, download it from here for windows.

For Linux,

sudo apt install ffmpeg

Now run the following to merge them both,

ffmpeg -i video.mp4 -i audio.mp4 -c:v copy -c:a aac output.mp4

It doesn't take more than a few seconds to merge. Once complete, enjoy the video.

Conclusion

In this post, we saw how to download a Reddit video along with its audio using python. There was nothing fancy in this approach; all we had to do was get the URL somehow.
If there's a better approach, please do tell.

I went ahead and created a python CLI named redl. Run,

redl https://www.reddit.com/r/PublicFreakout/comments/kwif5d/insurrectionist_gets_arrested_trying_to_break/

to download a video.

The whole code can be found at

amalshaji / redl

A Reddit video downloader(with audio) in Python

redl - A Reddit video downloader(with audio)

⚠️ Requires `ffmpeg` installed

Redl scrapes the reddit post json and retrives both audio and video URLs. Once these files are downloaded, it uses ffmpeg to join them.

Installation

pip install redl --user

Usage

redl https://www.reddit.com/r/Damnthatsinteresting/comments/kwrbde/making_a_grapefruit_dessert/

View on GitHub

Top comments (1)

Yash Shah • Jan 15 '21

Nice. I will try this. Very helpful 👍