What would you say, if I tell you there's a tool installed on my Ubuntu machine which makes downloads from reddit much easier?
In past few days, I was busy creating a machine learning project. I needed tons of images and I haven't found anywhere better than reddit, the front page of the internet for crawling and downloading pictures I needed. Pictures are provided by people and most of them are real world pictures and not fancy advertisements for a luxury restaurant.
So, I couldn't crawl reddit using
Nokogiri. But I realized something. For a project, I have used JSON API to get a bunch of pictures. So, I wanted an automation for the downloads! I opened up my VS Code, grabbed a cup of coffee, went to my black metal playlist on Spotify and started coding.
Now, I have this really cool tool which can help me create datasets for my A.I. project!
On a Linux, BSD, macOS or WSL machine, you need to install ruby first. my personal preference is always RVM, but as long as what you have installed can handle
httparty gem, that's OK.
For installing, just run this command:
gem install reddit_junkie
and it'll be available as a command line tool for you.
reddit_junkie --subreddit SUB
for example, if you want the latest things from r/skyporn you just run :
reddit_junkie --subreddit skyporn
reddit_junkie --subreddit SUB --directory DIR
For example, you've built a folder called
sky and you want to save the pictures there. Also, if you haven't created the folder,
reddit_junkie will create it for you.
reddit_junkie --subreddit skyporn --directory sky
reddit_junkie --subreddit SUB --count COUNT
For example, you want to download 300 pictures of the sky :
reddit_junkie --subreddit skyporn --count 300
reddit_junkie --subreddit SUB --count COUNT --directory DIR
For example, you want to download 300 pictures of the sky, in your
sky directory :
reddit_junkie --subreddit skyporn --count 300 --directory sky
- The CLI tool isn't tested with the
--endpointflag yet. It seems OK though.
- In case of more than 100 images, you only can do the download for numbers dividable by 100. Like 300 or 1000 or 25000. As I made this tool to help me make a dataset, I haven't spent much time on fixing this issue.
- CLI flags/parameters reading isn't really good. It works just fine, but not absolutely in the POSIX way.