DEV Community

Abhijith Neil Abraham
Abhijith Neil Abraham

Posted on

Redditflow- Find data from any timeline from past to future and feed your ML pipelines

Finding data for your ML models can be cumbersome, and there are multiple resources from which you can find data to collect it from. Depending on the data domain and task, you can find suitable data from resources of which some involve social media. At NFFlow, we ensure that data collection and training ML models are made simple for you, and our mission is to simplify the process from data collection to ML model. You can even schedule cron jobs, to collect data which supposedly appears in the future.

USECASE

Imagine you want to train a model with text or image data, and you don't wanna go through all that python jargon where you have to code a scraper and and ML model. That is where redditflow, a reddit api from NFFLOW comes to your rescue!

Let's break down the usage of the API, and how you're gonna benefit from it.

TEXT API

The text api will help you scrape data from any timeline. All you need is a config file, where you specify your topic of interest, and the time period where you want to scrape from. There is an ML enabled classifier algorithm which will help you filter the data you scraped. Optionally, if you want a trained ML model as output from the scraped data, you can do specify that in the config.

Here's a demonstrated example:

config = {
        "sort_by": "best",
         "subreddit_text_limit": 50,
        "total_limit": 200,
        "start_time": "27.03.2021 11:38:42",
        "end_time": "27.03.2022 11:38:42",
        "subreddit_search_term": "healthcare",
        "subreddit_object_type": "comment",
        "ml_pipeline": {""ml_pipeline":{"model_name":'distilbert-base-uncased','model_output_path':'healthcare_27.03.2021-27.03.2022_redditflow"}
    }
from redditflow import TextApi
TextApi(config) 
Enter fullscreen mode Exit fullscreen mode

As promised, we saved you from all the python jargon!

We have uploaded a few sample models to huggingface hub using redditflow. Check it out here!

Image API

Say you want to collect all images of a particular topic over a period of time, for eg: collect all images of cats from reddit over the period of a year. Here is how you can do it via few lines of python code.

config = {
        "sort_by": "best",
        "subreddit_image_limit": 3,
        "total_limit": 10,
         "start_time": "13.11.2021 09:38:42",
         "end_time": "15.11.2021 11:38:42",
         "subreddit_search_term": "cats",
         "subreddit_object_type": "comment",
         "client_id": "$CLIENT_ID", # get client id for praw
         "client_secret": $CLIENT_SECRET, #get client secret for praw
         }

from redditflow import ImageApi
ImageApi(config)

Enter fullscreen mode Exit fullscreen mode

Running the API requires praw , a python api for scraping reddit, so you will be required to provide a praw client id and secret.

Contributions

Well, there's a lot we can do for the community through open source. We welcome all contributions which will help us move forward a step in helping making the data science process simpler. Check out https://github.com/nfflow/redditflow

Top comments (0)