I'm still working on my side-project where I'm gathering information around the web. I'm eventually going to use this information in a weekly aggregate newsletter for Real Estate Investing and Property Management. If you're curious, The Newsletter is Here. For this part of the project, I'm going to scrape some of Reddit's API to find interesting Real Estate and Landlord Posts.
The Tooling
There is only one package you need to successfully scrape the reddit API in NodeJS: snoowrap.
Snoowrap is a "fully featured javascript wrapper for the Reddit API" -- quote taken from the github repo's index page. Snoowrap is really great, and it allows you to query posts, comments, scores, etc...
All of the responses are wrapped in their own little objects as well, and its all fairly well documented. Also, if you're using an IDE like Webstorm, you can easily auto-complete the functions and classes because of really great type definitions in the project.
Installing snoowrap
Install Snoowrap just like any other npm package in NodeJS:
npm install snoowrap --save
and require it:
var snoowrap = require('snoowrap');
Setting up Snoowrap
Before making any calls to the Reddit API, you have to go through an initial setup for oauth2 to generate an app, and tokens. This is fairly straightforward, but requires a few steps.
- go to https://not-an-aardvark.github.io/reddit-oauth-helper/ and note the redirect URL you must use when creating your reddit app (the thing you use to call the API). As of this writing, the URL is:
https://not-an-aardvark.github.io/reddit-oauth-helper/
- go to
https://www.reddit.com/prefs/apps/
and create a new app. It should generally look like this:
Next, go back to https://not-an-aardvark.github.io/reddit-oauth-helper/, select the permissions you want, and generate your tokens.
Now, you can configure the snoowrap object in your script.
const r = new snoowrap({
userAgent: 'A random string.',
clientId: 'Client ID from oauth setup',
clientSecret: 'Client Secret from oauth setup',
refreshToken: 'Token from the oauth setup'
});
The Script for querying RealEstate subreddit
Now that you're all set up with snoowrap (great job, you smart developer you). You can query reddit's API in NodeJS with a script similar to the one below:
import snoowrap from 'snoowrap';
export async function scrapeSubreddit() {
const r = new snoowrap({
userAgent: 'A random string.',
clientId: 'Client ID from oauth setup',
clientSecret: 'Client Secret from oauth setup',
refreshToken: 'Token from the oauth setup'
});
const subreddit = await r.getSubreddit('realEstate');
const topPosts = await subreddit.getTop({time: 'week', limit: 3});
let data = [];
topPosts.forEach((post) => {
data.push({
link: post.url,
text: post.title,
score: post.score
})
});
console.log(data);
};
Conclusion
The ☝️ script above outputs the top 3 posts from Reddit's RealEstate API. Pretty neat right? I thought this was a fun experience, and I really love how Snoowrap works. Now I can use this data to flesh out the newsletter I'm making, again, if your curious, you can check it out here.
Thank you, have a nice day!
Top comments (0)