DEV Community

Cover image for A Comprehensive Guide to Extract Tweets using Tweepy
Sahil
Sahil

Posted on • Originally published at sahilfruitwala.com

A Comprehensive Guide to Extract Tweets using Tweepy

Twitter is one of the most popular sources of data in this age of Artificial Intelligence. Today, data is key to almost everything. To extract data from this amazing platform, Twitter provides APIs. We can use the API endpoints provided by Twitter, but, in this blog, we will use the Tweepy library.

You can do so much with Twitter API/Tweepy using Python that it is hard to cover all in one blog. So, I will divide it into two parts. In this blog, we will cover 5 topics related to searching the tweets. We will learn:

  1. How to search tweets with Keywords
  2. How to search tweets with specific user mentioned or of specific user
  3. How to find tweets with specific hashtags
  4. How To combine all three options
  5. How to do pagination/How to fetch N number of tweets while tacking limitation of API

Before we start with the actual part of our blog, just confirm that you have installed Tweepy on your system. If not try the following command in your terminal.

pip install Tweepy
Enter fullscreen mode Exit fullscreen mode

How to Search Tweets with Specific Keywords

Twitter API has some restrictions, so in this blog, I will show you how to get recent Tweets that contain specific keywords. For example, we want to get tweets that contain either bitcoin or python. To get tweets with any or both of these keywords, we have to form a query. And the query would be bitcoin OR python. One thing to note here is that OR must be in a capital case.

To fetch the tweets we can use the following code:

from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

KEYWORDS = "bitcoin OR python"

# Basic keyword search
tweets = api.search_tweets(KEYWORDS)
Enter fullscreen mode Exit fullscreen mode

I have written another blog that will show you how to form the search queries and how you can get your Twitter API keys.

So, how does this code work?
To fetch data using Twitter API we need to authenticate first. In version 1.1, we need to use OAuth for authentication. Whereas, in version 2, we can do most of the tasks with Bearer Token only.

After the authentication, we need to form a query. In our case, we want tweets that contain either bitcoin or python. Afterwards, we need to use search_tweets() method to pass our query. And that's it!

How to Search Tweets with Full Text

To extract full tweets, we need to pass one more argument, tweet_mode="extended" in our search_tweets() method. So, our new code to extract full tweets is

from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

KEYWORDS = "bitcoin OR python"

# Basic keyword search with full text
tweets = api.search_tweets(KEYWORDS, tweet_mode="extended")
Enter fullscreen mode Exit fullscreen mode

How to get Tweets with only the English language?

We just need to add one more argument in our serach_tweet() method.

# Basic keyword search with full text and lang
tweets = api.search_tweets(KEYWORDS, lang="en", tweet_mode="extended")
Enter fullscreen mode Exit fullscreen mode

What if you want recent tweets or most popular tweets?
Using Tweepy we can get either recent tweets, popular tweets or mixed version. By default, Tweepy returns mixed Tweets.

# Basic keyword search with full text and result type
tweets = api.search_tweets(KEYWORDS, result_type="recent", tweet_mode="extended")
tweets = api.search_tweets(KEYWORDS, result_type="popular", tweet_mode="extended")
tweets = api.search_tweets(KEYWORDS, result_type="mixed", tweet_mode="extended")
Enter fullscreen mode Exit fullscreen mode

How to get N numbers of Tweets?

Good news! You have to just pass another argument count in search_tweets() method.

# Basic keyword search with full text and count
tweets = api.search_tweets(KEYWORDS, count=100, tweet_mode="extended")
print("Total:", len(tweets))

Enter fullscreen mode Exit fullscreen mode

But, there is a catch here.

Get N number of Tweets using Tweepy

You see, even if I asked for 100 Tweets sometimes I got 90, sometimes 89 or even 85. So, I am not certain how many you can get using the count argument. So, for now, I will suggest if you use the count argument use the number below 85.

Don't Worry! Tweepy has provided a way to fetch N number of Tweets but sometimes it depends on the quota and subscription of your Twitter API.

How to Search Tweets with Specific Hashtags

To get tweets based on hashtags is the same as the keywords. Tweepy has not provided any specific function to retrieve Tweets based on Hashtags. So, to get these Tweets we have to again form the query as we did in the search based on keywords.

Let's extract Tweets based on #javascript and #backend. Here, we want tweets with both hashtags. So, the query will be #javascript AND #backend.

from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

print("-" * 15)

QUERY = "#javascript AND #backend -filter:retweets"
tweets = api.search_tweets(QUERY, tweet_mode="extended", count=5, result_type="recent")

for i in tweets:
    print(i.full_text)
    print("-" * 15)
Enter fullscreen mode Exit fullscreen mode

Note: I am Showing only small 3 Tweets here on purpose. Because others were too big and it will just create more confusion.

How to Search Tweets with Specific Hashtags

As you can see in the output, Every tweet has javascript and backend both hashtags in them. Here, I have used -filter:retweets in our query to exclude annoying retweets from our result.

How to Search Tweets containing Specific User

So far, we saw how to search tweets based on keywords and hashtags. But, now we will see how to extract Tweets that have specific user mentions.

It is really very simple. We have formed a query mentioning our desired user. For example, we want to fetch tweets that mention Elon Musk. So, the query will become @elonmusk. Here, elonmusk is the username of Elon Musk.

from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

print("-" * 15)

QUERY = "@elonmusk -filter:retweets"
tweets = api.search_tweets(QUERY, tweet_mode="extended", result_type="recent")

for i in tweets:
    print(i.full_text)
    print("-" * 15)
Enter fullscreen mode Exit fullscreen mode

Again, I have shown only a few tweets here.

How to Search Tweets containing Specific User

How to fetch Tweets based on keyword, hashtags and username?

So far, we saw the three most used queries to fetch tweets. Just to give a bit of test to form a query now let's combine all of these three methods in a single go. So, we will try to fetch tweets that have the tesla keyword, cryptocurrency hashtag and elonmusk mentioned.

The query we need to form will be tesla AND #cryptocurrency AND @elonmusk

from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

print("-" * 15)

QUERY = "tesla AND #cryptocurrency AND @elonmusk -filter:retweets"
tweets = api.search_tweets(QUERY, tweet_mode="extended", count=7, result_type="recent", lang='en')
for i in tweets:
    print(i.full_text)
    print("-" * 15)
Enter fullscreen mode Exit fullscreen mode

How to fetch Tweets based on keyword, hashtags and username?

How to Retrieve Specific Number of Tweets using Tweepy

Earlier, we saw, how we can pass the count argument to the search_tweets() method and get a specific number of tweets. But using this method we can get only 100 tweets as per the documentation. So, how do we fetch more tweets?

To retrieve n number of tweets, we can use the Cursor class provided by Tweepy. The cursor operates or works like pagination which is what we need to retrieve N number of tweets.

So, how do we use it? 🤔

To fetch 1000 tweets using the Cursor class, we need to pass mainly two arguments:

  1. Method that we want to paginate
  2. Query

Now specify the number of tweets we want, we can use the items() method of the Cursor class.

from tweepy import OAuth1UserHandler, API, Cursor

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth, wait_on_rate_limit=True)

all_tweets = []
for tweet in Cursor(api.search_tweets, "#python", count=1000).items():
    all_tweets.append(tweet.text)

print(len(all_tweets))
Enter fullscreen mode Exit fullscreen mode

Here, if you have noticed, I have passed one more argument in our API class. Twitter API has a rate limit/quota. If we don't want to handle the error manually then we can pass wait_on_rate_limit=True to API() class.

Because of this, whenever we hit a rate limit, we will get see the following message on our console.

Rate limit reached. Sleeping for: 802

The 802 seconds is not the fixed duration. In general, we have to wait between 13 to 15 minutes.

How to Retrieve Specific Number of Tweets using Tweepy

Conclusion

You might have observed that the most difficult task to search and fetch tweets is to form a proper query. To solve that problem, I have written a whole blog post explaining Twitter Search API. Once you go through that blog you will be able to write your own search queries.

Let me know if you need any help or want to discuss something. Reach out to me on Twitter or LinkedIn. Make sure to share any thoughts, questions, or concerns. I would love to see them.

Till the next time 👋

Top comments (0)