DEV Community

loading...

Data gathering with python. An Example, get the tweets (part 2)

ViksaaSkool
software engineering, stand up comedy. very interesting person.
・2 min read

In the previous post, the part one I've displayed how to automate part of the data gathering process where you know most of the constrains and you need to cut down the manual work of data entry. Namely, my challenge was that I wanted to get tweets from English (most resources for data analysis - sorry other languages 😩) speaking stand up comedians in predefined period.
I've managed to produce .csv file with name of the stand up comedian and respective twitter handle. Everything I need to gather(scrape) their tweets.

There's few ways one can achieve this:

  1. Use Twitter API and preferred python library (there are a couple of good ones).
    PRO: you're using the API - you know what data you'll get regardless of the UI/structural changes
    CON(?): limited amount of options for free usage, that needs to be approved - there's a process where you need to apply and get the API Key as a result of a request for use processed successfully.

  2. Use GetOldTweets3 (and variations of it)
    PRO: easy to use for very little amount of data
    CON: google "Too Many Requests"

  3. Use nasty library (NASTY Advanced Search Tweet Yielder).
    PRO: Easy to use, flexible
    CON: I haven't came across one

Since I've tried all the approaches, I'll present you with the colab in which the one I think works the best: nasty

The result is .csv file with the format (it's this printed in rows):

After this the data needs to be cleaned, normalized and purified in order to be used for various purposes like sentiment analysis, topic modeling, labeling and so on.

Cheers.

Discussion (0)