What would you say, if I tell you there's a tool installed on my Ubuntu machine which makes downloads from reddit much easier?
In past few days, I was busy creating a machine learning project. I needed tons of images and I haven't found anywhere better than reddit, the front page of the internet for crawling and downloading pictures I needed. Pictures are provided by people and most of them are real world pictures and not fancy advertisements for a luxury restaurant.
So, I couldn't crawl reddit using BeautifulSoup
or Nokogiri
. But I realized something. For a project, I have used JSON API to get a bunch of pictures. So, I wanted an automation for the downloads! I opened up my VS Code, grabbed a cup of coffee, went to my black metal playlist on Spotify and started coding.
Now, I have this really cool tool which can help me create datasets for my A.I. project!
CLI tool
Installing the reddit_junkie tool
On a Linux, BSD, macOS or WSL machine, you need to install ruby first. my personal preference is always RVM, but as long as what you have installed can handle httparty
gem, that's OK.
For installing, just run this command:
gem install reddit_junkie
and it'll be available as a command line tool for you.
Downloading 25 images, in the default "images" directory
reddit_junkie --subreddit SUB
for example, if you want the latest things from r/skyporn you just run :
reddit_junkie --subreddit skyporn
Downloading 25 images in a custom directory
reddit_junkie --subreddit SUB --directory DIR
For example, you've built a folder called sky
and you want to save the pictures there. Also, if you haven't created the folder, reddit_junkie
will create it for you.
reddit_junkie --subreddit skyporn --directory sky
Downloading more than 25 images in default "images" directory
reddit_junkie --subreddit SUB --count COUNT
For example, you want to download 300 pictures of the sky :
reddit_junkie --subreddit skyporn --count 300
Downloading more than 25 images in a custom directory
reddit_junkie --subreddit SUB --count COUNT --directory DIR
For example, you want to download 300 pictures of the sky, in your sky
directory :
reddit_junkie --subreddit skyporn --count 300 --directory sky
Known issues / not tested
- The CLI tool isn't tested with the
--endpoint
flag yet. It seems OK though. - In case of more than 100 images, you only can do the download for numbers dividable by 100. Like 300 or 1000 or 25000. As I made this tool to help me make a dataset, I haven't spent much time on fixing this issue.
- CLI flags/parameters reading isn't really good. It works just fine, but not absolutely in the POSIX way.
Top comments (0)