Data gathering with python. An Example, get the tweets (part 2)

#twitter #python #nasty #scraping

In the previous post, the part one I've displayed how to automate part of the data gathering process where you know most of the constrains and you need to cut down the manual work of data entry. Namely, my challenge was that I wanted to get tweets from English (most resources for data analysis - sorry other languages 😩) speaking stand up comedians in predefined period.
I've managed to produce .csv file with name of the stand up comedian and respective twitter handle. Everything I need to gather(scrape) their tweets.

There's few ways one can achieve this:

Use Twitter API and preferred python library (there are a couple of good ones).
PRO: you're using the API - you know what data you'll get regardless of the UI/structural changes
CON(?): limited amount of options for free usage, that needs to be approved - there's a process where you need to apply and get the API Key as a result of a request for use processed successfully.
Use GetOldTweets3 (and variations of it)
PRO: easy to use for very little amount of data
CON: google "Too Many Requests"
Use nasty library (NASTY Advanced Search Tweet Yielder).
PRO: Easy to use, flexible
CON: I haven't came across one

Since I've tried all the approaches, I'll present you with the colab in which the one I think works the best: nasty

The result is .csv file with the format (it's this printed in rows):

After this the data needs to be cleaned, normalized and purified in order to be used for various purposes like sentiment analysis, topic modeling, labeling and so on.

Cheers.

DEV Community

Data gathering with python. An Example, get the tweets (part 2)

Top comments (0)

Read next

The Bcrypt Algorithm for Secure Password Hashing

How to Uncheck All Your Twitter (X) Interests in Bulk with Developer Tools (Quick Method)

Dirty Code: Simple Rules to Avoid It

使用 selenium 讀取需要登入會員的網頁