DEV Community

Kevin Tewouda
Kevin Tewouda

Posted on • Edited on

Create a Twitter bot in python - part 1

Twitter is one of the most used social networks on the planet. As a result, it is full of many resources and also many annoyances. To help you get the best experience from the social network, various applications have emerged like threadreaderapp or blockpartyapp. These applications use the Twitter API that has existed since 2012.
In this article, we will see the basics to design one of these applications by creating two simple bots that will do the following actions:

  • Passive listening of messages paying tribute to the Marvel movie Wakanda Forever. And yes, I'm a big fan of Marvel movies. 😆
  • An instant listening of the same messages with this time an automatic retweet. This is something quite classical for bots.

In this first part of the article, we will see the important notions to know to start with the Twitter API and the endpoints that will interest us to perform our actions. Once this big part is over, the second part will be more fun with the implementation of bots. 😉

Prerequisites

1 - Have a Twitter account (yes, it's essential!). If you don't want to use your personal account, you can create an account just for your bot.

2 - Create a developer account that you will associate with your Twitter account. To do this, go to https://developer.twitter.com. You click on Sign up, then you will have to follow the instructions and choose the Bot option for the use you will make of the API. Once you have created your account, you will be assigned a Bearer Token. Keep it safe, as you will need it to make your requests later.

3 - Have a basic knowledge of python, know how to install it, and initialize a project. If you don't have any skills in this language, there are plenty of resources on the internet like this one.

Twitter API documentation

The documentation of the Twitter API is very well done and available here. It is best to start with the Getting Started and Fundamentals sections. We will work with version 2 of the API which is the most recent and easiest to learn. After that, there is the API endpoint reference that you have to keep somewhere under your pillow. 😉

A bit of terminology

Before manipulating the Twitter API, I want to make a quick point about the different objects we can manipulate and the format of the responses. The objects are the following:

The different endpoints of the Twitter API will return the first two types of objects, namely Tweet and User, for the rest, you will have to use the notion of expansion.

Let's take an example of the route that lists tweets based on their id. I will use the requests library which is very well known for making HTTP calls in the following examples. The API endpoint used is documented here.

import os
from pprint import pprint
import requests

params = {'ids': '1588915242490560512'}
# The bearer token is stored in an environment variable called BEARER_TOKEN
headers = {'Authorization': f"Bearer {os.getenv('BEARER_TOKEN')}"}
r = requests.get('https://api.twitter.com/2/tweets', params=params, headers=headers)
pprint(r.json())
# Sample response
{
    'data': [
        {
            'edit_history_tweet_ids': ['1588915242490560512'],
            'id': '1588915242490560512',
            'text': 'in what is becoming a tradition in odd point release...'
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

Note: if you wonder how to get a tweet id, click on a tweet from a web browser, look at the url, and after the status part, the numerical value that follows is its id. 😁 ✌️

You will notice that the data is in the data part of the response and that each object represents a tweet. Here there is only one object as I have only selected one in the query field. If you have observed the details of a Tweet object you will notice that not all the fields are present in the response, which is intended to avoid having too large responses. To display more fields, you'll have to use an additional query field tweet.fields. It will be the same with other types of objects, so if you want to get more fields from a user, for example, you will have to configure the user.fields field.
To come back to our example, let's say that we will display the date of creation of the tweet and the author's id.

import os
from pprint import pprint

import requests
params = {
    'ids': '1588915242490560512',
    # we add the extra fields we want
    'tweet.fields': 'created_at,author_id'
}
headers = {'Authorization': f"Bearer {os.getenv('BEARER_TOKEN')}"}
r = requests.get('https://api.twitter.com/2/tweets', params=params, headers=headers)
pprint(r.json(), indent=4)
# Sample response
{
    'data': [
        {
            'author_id': '1037022474762768384',
            'created_at': '2022-11-05T15:24:49.000Z',
            'edit_history_tweet_ids': ['1588915242490560512'],
            'id': '1588915242490560512',
            'text': 'in what is becoming a tradition in odd point release...',
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

Bingo! We have our extra fields! But it would be nice to have more information about the author, not just his id. How to do that? This is where expansions come in! Expansions allow you to add additional objects in the response, here it's a User we want to include, but it could be a Media or even a Tweet related to the one we searched (parent tweet for example if we are in a conversation). The list of possible expansions is displayed in the documentation of each API endpoint, for the one we use you will find it here.

Our example will look like this:

import os
from pprint import pprint

import requests
params = {
    'ids': '1588915242490560512',
    # we add the extra fields we want
    'tweet.fields': 'created_at,author_id',
    # we include user information in the response
    'expansions': 'author_id'
}
headers = {'Authorization': f"Bearer {os.getenv('BEARER_TOKEN')}"}
r = requests.get('https://api.twitter.com/2/tweets', params=params, headers=headers)
pprint(r.json(), indent=4)
# Sample response
{
    'data': [
        {
            'author_id': '1037022474762768384',
            'created_at': '2022-11-05T15:24:49.000Z',
            'edit_history_tweet_ids': ['1588915242490560512'],
            'id': '1588915242490560512',
            'text': 'in what is becoming a tradition in odd point release...',
        }
    ],
    'includes': {
        'users': [
            {
                'id': '1037022474762768384',
                'name': 'htmx.org',
                'username': 'htmx_org'
            }
        ]
    }
}
Enter fullscreen mode Exit fullscreen mode

Et voilà! You will notice a new includes key added in the answer. It includes all the expansions we requested. In our case, it is the author's information. Again, this is default information, if you want to see more information from the user, you'll have to add a user.fields query parameter with the desired fields. I leave it as an exercise. 😁

Note: Most of the responses returned by the Twitter API endpoints have this structure with four root keys:

  • data: the requested main object is returned in this key. It can be an object or a list of objects.
  • includes: which contains objects resulting from expansions.
  • meta: an optional key that will contain pagination information when there are many objects to return.
  • errors: an optional key if there is a problem with the request. It will contain the list of errors.

Note: Due to a large number of users of the Twitter API, it goes without saying that there are limitations on the number of calls you can make. More information is on this page. Note that the documentation for each endpoint also specifies the number of calls you can make on that route during a 15-minute window.

Observe a theme using keywords

Now let's get to the heart of the matter, we'll see how to observe specific events that happen on Twitter and perform an appropriate action. There are two endpoints that will interest us.

Search recent tweets.

The documentation of this endpoint can be found here. It can be used in two ways:

  • Obtaining historical data: In this mode, we search for tweets from the last seven days.
  • Polling: In this mode, we search for recent tweets in relation to our last search. If it's not clear, don't be afraid, you'll understand with an example :)

Build a search filter

The first step will be to create a filter that will allow us to search for specific tweets. The documentation on how to create a filter can be found here and a more advanced tutorial here. I will however mention some information.

1 - Depending on the type of access you have, probably Essential if you are just starting out, you will be limited to creating a filter of 512 characters in length.

2 - You can use several operators to perform your search, some of the most important are:

  • AND which is naturally used by separating the search keywords with a space. For example, if I want a tweet that includes the words banana and potato (I'm not hungry!) I would build a banana potato filter.
  • OR if I want a tweet that includes banana or potato, I would write (banana OR potato). Note that parentheses are necessary, for example, if you want to search for exact sentences like ("Twitter API" OR #v2).
  • - the negation of a search, for example, if I search for tweets that have potato but not banana, I will have a filter potato -banana. So a sentence that contains both words will not pass the search.
  • from: to search for tweets from a specific account, for example from:marvel.
  • to: to search for tweets in response to a specific account, for example to:marvel.
  • conversation_id: In discussion threads, tweets descending from the root tweet will have a conversation_id field to try to reconstruct the whole discussion. This can be used to filter tweets. Example: conversation_id:1334987486343299072.
  • has: allows you to search for tweets with metadata, for example, a tweet search containing media or links: (has:media OR has:links).
  • is: allows you to search for different types of tweets such as retweets (is:retweet), quotes, or retweets with a message (is:quote) or other contextual information such as whether the user is verified (you know the little blue mark that may soon become payable :p) is:verified.

3 - The complete list of filters is available here.

For our example, we will search positive messages about the last Marvel movie (at the moment of writing) Wakanda Forever. The filter will be the following: ("black panther" OR #wakandaforever) (magnificent OR amazing OR excellent OR awesome OR great) -is:retweet. Normally, you should understand it but let's unpack it, so we look for tweets:

  • Having the term black panther (for your information the search is case-insensitive) or the hashtag #wakandaforever ("black panther" OR #wakandaforever)
  • and having one of the following words: magnificent, amazing, excellent, awesome, great (magnificent OR amazing OR excellent OR awesome OR great)
  • and is not a retweet -is:retweet. This last filter is very important and even recommended in the official documentation because many tweets are often retweets, it avoids adding unnecessary noise in our replies.

Get historical

As a reminder, the route we are interested in allows us to search for tweets over the last 7 days. We could continue to work with the requests library, but it would be tedious to analyze the results, manage the pagination, etc. That's why we are going to use a specialized library for the Twitter API named tweepy. We will use its tweepy.Client class to manipulate the endpoints of the Twitter v2 API. To know the mapping between the methods of this class and the API endpoints, please refer to this page. In our case, the method we are interested in is search_recent_tweets.

import os

import tweepy

client = tweepy.Client(os.getenv('BEARER_TOKEN'))
response = client.search_recent_tweets(
    '("black panther" OR #wakandaforever) (magnificent OR amazing OR excellent OR awesome OR great) -is:retweet',
    max_results=100,
    tweet_fields=['created_at']
)
if response.data is not None:
    for tweet in response.data:
        print(tweet.id, tweet.created_at, tweet.text)
# pagination information is listed here
print(response.meta)
Enter fullscreen mode Exit fullscreen mode

Notes:

  • The usage is simple, here we have specified the filter which is the only mandatory argument. Then we specified the number of elements we wanted to display and the additional fields. In general, the method arguments are the same as the API endpoint can take. To know all the possible arguments to pass to this method, refer to its documentation.
  • The returned response object is a namedtuple which contains four keys data, includes, meta and errors. The same as when you make the request yourself requests for example :)
  • In response.data, we have the list of requested objects. As a rule, if you know the list of fields of the object, then you can use the field names as properties as done in the example above. However, they are all documented in the tweepy documentation :)

Now imagine the case where you have more than 100 results, how do you iterate over all the results? This is where we use the information provided by response.meta. We will have a next_token information that we will have to use in our method. We could complete the previous code with this piece of code:

next_token = response.meta.get('next_token')

while next_token is not None:
    response = client.search_recent_tweets(
        '("black panther" OR #wakandaforever) (magnificent OR amazing OR excellent OR awesome OR great) -is:retweet',
        max_results=100,
        tweet_fields=['created_at'],
        next_token=next_token
    )
    if response.data is not None:
        for tweet in response.data:
            print(tweet.id, tweet.created_at, tweet.text)
    next_token = response.meta.get('next_token')
Enter fullscreen mode Exit fullscreen mode

And there we go! But tweepy has a Paginator class making our lives easier. Here's an example:

import os

import tweepy
client = tweepy.Client(os.getenv('BEARER_TOKEN'))
for response in tweepy.Paginator(
        client.search_recent_tweets,
        '("black panther" OR #wakandaforever) (magnificent OR amazing OR excellent OR awesome OR great) -is:retweet',
        max_results=100
):
    if response.data is not None:
        for tweet in response.data:
            print(tweet.id, tweet.text)
Enter fullscreen mode Exit fullscreen mode

Easy-peasy! You can limit the number of response iterations returned with the limit argument. This way of using Paginator is interesting if you want to read possible expansions contained in response.includes but if you are only interested in the data part, then the flatten method should be useful.

import os

import tweepy

client = tweepy.Client(os.getenv('BEARER_TOKEN'))

for tweet in tweepy.Paginator(
        client.search_recent_tweets,
        '("black panther" OR #wakandaforever) (magnificent OR amazing OR excellent OR awesome OR great) -is:retweet',
        max_results=100
).flatten():
    print(tweet.id, tweet.text)
Enter fullscreen mode Exit fullscreen mode

Here we iterate directly on tweets. Also, we can pass a limit of tweets to retrieve with the limit argument to pass to the flatten method.

Polling

The second way to use the recent tweets endpoint is polling, it's a bit of a real-time mode. Here the idea is to search for all the tweets that match the filter, not in the past, but from a specific tweet. This requires knowing a (recent) tweet that already meets our criteria and iterating from there. For example, if we have a tweet with an id 10000 that matches our search, and we want to know in real-time all the tweets from there that satisfy our filter, we will write a code like the following:

import os

import tweepy

client = tweepy.Client(os.getenv('BEARER_TOKEN'))
response = client.search_recent_tweets(
    '("black panther" OR #wakandaforever) (magnificent OR amazing OR excellent OR awesome OR great) -is:retweet',
    max_results=100,
    tweet_fields=['created_at'],
    since_id=10000  # we can use a string or an integer
)
if response.data is not None:
    for tweet in response.data:
        print(tweet.id, tweet.created_at, tweet.text)
Enter fullscreen mode Exit fullscreen mode

Note: This last example will not work because we are not in the last 7 days window.

If tweets more recent than id 10000 and matching our criteria exist, they will be displayed. To continue iterating on more recent tweets, we will use the data included in response.meta. Let's assume that the latter contains this information:

{
  "newest_id": "12000",
  "oldest_id": "10005",
  "result_count": 7
}
Enter fullscreen mode Exit fullscreen mode

newest_id corresponds to the most recent tweet in the result list, and we will use this value to continue iterating results by replacing the value of since_id with 12000.
However, if you get more results that cannot be returned at once, you will have a next_token key in meta.

{
  "newest_id": "12000",
  "oldest_id": "10005",
  "next_token": "fnsih9chihsnkjbvkjbsc",
  "result_count": 10
}
Enter fullscreen mode Exit fullscreen mode

In this case, if you use since_id=12000, you will lose all the results that were not displayed at once. The solution is to keep since_id at 10000 and add a next_token argument. Iterate over the results until there is no more next_token in meta before resuming with a since_id of 12000. This is what a code that looks for the latest Wakanda Forever tribute could look like:

import os

import tweepy

client = tweepy.Client(os.getenv('BEARER_TOKEN'))
# replace the value here by a relevant tweet id
first_since_id = 10000
search = '("black panther" OR #wakandaforever) (magnificent OR amazing OR excellent OR awesome OR great) -is:retweet'
response = client.search_recent_tweets(
    search,
    max_results=100,
    tweet_fields=['created_at'],
    since_id=first_since_id
)
if response.data is None:
    since_id = first_since_id
else:
    since_id = response.meta.get('newest_id')
    for tweet in response.data:
        print(tweet.id, tweet.created_at, tweet.text)
while True:
    # you can sleep the code here to not exceed the amount of tweets you can read :)
    next_token = response.meta.get('next_token')
    while next_token is not None:
        paginated_response = client.search_recent_tweets(
            search,
            max_results=100,
            tweet_fields=['created_at'],
            next_token=next_token,
            since_id=since_id
        )
        for tweet in paginated_response.data:
            print(tweet.id, tweet.created_at, tweet.text)
        next_token = paginated_response.meta.get('next_token')
    # if the first request returns nothing, we continue with the first since_id
    since_id = response.meta.get('newest_id') or first_since_id
    response = client.search_recent_tweets(
        search,
        max_results=100,
        tweet_fields=['created_at'],
        since_id=since_id
    )
    if response.data is None:
        continue
    for tweet in response.data:
        print(tweet.id, tweet.text)
Enter fullscreen mode Exit fullscreen mode

It's tedious, isn't it? Fortunately, there is another much simpler endpoint to track tweets in real-time and that's what we'll see right now!

Filtered stream

The second method to search for tweets in real-time is the filtered stream. Its documentation can be found here. It includes 3 endpoints to use:

Unlike the previous endpoint, you can add several filtering rules, the number depends on the type of access you have.
For Essential access, we have 5 rules of 512 characters each. This is probably the case for you if you are new to the Twitter API.
For Elevated access, we have 25 rules of 512 characters each.
For Academic access, we have 1000 rules of 1024 characters each.

The idea here is simple, we add one or many filtering rules, and we search for recent tweets that are related to these filters. If only one rule matches a tweet, it will be returned. To manipulate these endpoints with tweepy, we will use the tweepy.StreamingClient class.

To add, read and delete filters, we can write:

import os

import tweepy

client = tweepy.StreamingClient(os.getenv('BEARER_TOKEN'))
rules = [
    # we add our rules here
    tweepy.StreamRule(
        '("black panther" OR #wakandaforever) (magnificent OR amazing OR excellent OR awesome OR great) -is:retweet',
        tag='black panther tribute'
    )
]
client.add_rules(rules)
# we list our rules
response = client.get_rules()
for rule in response.data:
    print(rule)
# we delete one or more routes by passing their id
client.delete_rules(['158939726852798054'])
Enter fullscreen mode Exit fullscreen mode

Notes:

  • You can pass the argument dry_run=True to the methods add_rules and delete_rules to test the syntax of the rule to make sure it is correct without actually executing it on the server side.
  • When defining several rules, it is recommended to associate a tag to remember what the filter does. Indeed, a filter can be quite complex to read :)

Once we have created our rules, all we have to do is call the endpoint to list the tweets. Normally the method to use is listen except that if we call it as is, we won't see anything because by default StreamingClient does nothing with the tweets it retrieves 😆. So you have to inherit from the class and override some of these methods.

import os

import tweepy

class IDPrinter(tweepy.StreamingClient):
    # we can get a response object
    def on_response(self, response):
        # It has the structure: StreamResponse(tweet, includes, errors, matching_rules)
        # So for each tweet, we have all the matching_rules
        print(response)
    # or we can just read the tweet
    def on_tweet(self, tweet):
        print(tweet.id, tweet.text)
    def on_errors(self, errors):
        print(errors)
    def on_connection_error(self):
        # what to do in case of network error
        self.disconnect()
    def on_request_error(self, status_code):
        # what to do when the HTTP response status code is >= 400
        pass

printer = IDPrinter(os.getenv('BEARER_TOKEN'))
printer.filter()
Enter fullscreen mode Exit fullscreen mode

Notes:

  • By overloading on_response, we have as a parameter a StreamingResponse object that contains the tweet and the set of rules that matched this tweet.
  • You can pass the argument threaded=True to filter to avoid blocking the entire python script and get the created thread to close it later.
  • There is an asynchronous version of this streaming class AsyncStreamingClient that allows you to handle coroutines instead of threads. I won't talk about it here as it is an advanced topic. For the brave ones, you can read this article. There is another method of the StreamingClient class, namely sample which is linked to this endpoint. It does not take into account the filters created and returns 1% of the new tweets registered on the Twitter platform. You can't get more real-time than that 😅. This would allow you, for example, with natural language processing algorithms to detect Twitter trends like the ones we see displayed on the right of the web interface. Be careful though when using this endpoint, it can quickly end up your limit of tweets to read per month. You should have a clear objective before using it and only do it in a short period of time.

Ok, if you made it this far, congratulations! That was a big chunk but necessary to understand how the Twitter API works. In the second part of the article (because this one is getting very long) we will actually see how:

  • Send an email with tweets found in a recent search.
  • Retweet instantly all Wakanda Forever tribute tweets.
  • And a little bonus that I won't mention here 😉.

Take care of yourself and see you next time! 😁


This article was originally published on Medium.

If you like my article and want to continue learning with me, don't hesitate to follow me here and subscribe to my newsletter on substack 😉

Top comments (0)