DEV Community

Sajid Shaikh
Sajid Shaikh

Posted on

Scrape twitter profiles and hashtags

I was going through this project that scrapes twitter however it is now not working properly as Twitter has changed its front-end code structure and even the way how tweets fetch from the backend. Now, sending an HTTP request and parsing that HTML source code to get the tweet's data does not work and I needed even more data than what twitter's API can offer. So, I created this project to run with a headless web browser and get the tweet's data.

What data do we get?

Key Type Description
tweet_id String Post Identifier(integer casted inside string)
username String Username of the profile
name String Name of the profile
profile_picture String Profile Picture link
replies Integer Number of replies of tweet
retweets Integer Number of retweets of tweet
likes Integer Number of likes of tweet
is_retweet boolean Is the tweet a retweet?
retweet_link String If it is retweet, then the retweet link else it'll be empty string
posted_time String Time when tweet was posted in ISO 8601 format
content String content of tweet as text
hashtags Array Hashtags presents in tweet, if they're present in tweet
mentions Array Mentions presents in tweet, if they're present in tweet
images Array Images links, if they're present in tweet
videos Array Videos links, if they're present in tweet
tweet_url String URL of the tweet
link String If any link is present inside tweet for some external website.

What we can scrape?

  • Any profile's tweet that exists on Twitter.
  • Scrape by keyword as well, like "google".
  • Scrape by hashtags like "#india".

What if the IP is getting blocked due to too many requests?

  • It has a feature to set proxies as well, authenticated as well as unauthenticated.

To know more about it's usage check the entire repository here

Discussion (0)