Analyzing images using the Twitter API v2 and AWS

#twitter #api #python #aws

The Twitter API v2 lets you get Tweets with images programmatically. Many times, researchers want to study these images to better understand the conversation on a topic. An example use-case of researchers analyzing images is user's studying memes on Twitter. For this, they may want to identify objects in the image or extract text from these images (memes). In this tutorial (meant for academic researchers), I will showcase how you can study images in Twitter, using the Twitter API v2 and Amazon Rekognition.

Note: Twitter does not own or maintain Amazon Rekognition. Amazon Rekognition is developed and owned by Amazon Web Services and I have used it for education purposes.

Prerequisites

In order to use the Twitter API v2, you need to apply for a Twitter developer account. Once you have an approved Twitter developer account, follow the instruction here to obtain your BEARER TOKEN that you will use to connect to the Twitter API v2 in you code in Python. We will be using the Tweepy package in Python to get images from the Twitter API v2, so you will need to have Python as well as the Tweepy package installed on your machine. Instructions on installing the Tweepy package can be found in this tutorial.

You will also need an Amazon Web Services (AWS) account in order to use the Amazon Rekognition. Instructions on setting up your AWS credentials locally can be found here. Please set it up in order to get the sample code working.

Getting a Tweet's image url with the Twitter API v2

In order to use the Twitter API v2 in Python, you need to first import Tweepy and initialize the Tweepy client, which makes the API calls for you. You can initialize it as:



client = tweepy.Client('BEARER_TOKEN')

Next, you need to specify the Tweets that you are looking for. If you want to get Tweets from a certain account, which contain images, you can specify this condition in the search query. has:images tells the Twitter API that you want Tweets that contain images in them.



# Replace with your own search query
query = 'from:twitterdev -is:retweet has:images'

Next, we will call the search_recent_tweets method, which searches for Tweets from the last 7 days. Because we want the image information, we do two additional things:

we set the expansions to attachments.media_keys
we set the media_fields to url

This will give us the URL for images in the Tweets. In this example, I am only getting 10 Tweets from my search result. You can get upto 100 Tweets with the search_recent_tweets method by setting the max_results to 100. If you want more than 100 Tweets, you will need to use pagination. Check out this example that shows how you can use pagination.



# This method gives us Tweets from the last 7 days with images
tweets = client.search_recent_tweets(query=query,
                                    media_fields=['url'],
                                    expansions='attachments.media_keys',
                                    max_results=10)

If your search query returns Tweets with images, the image information will be available in the tweets.includes under media. So, we check if this information is present. If it is, we create a dictionary with media key and the media information. In the snippet below, we print the image URL to the console.



# Check to see if this search query gives any Tweets with images
if 'media' in tweets.includes:

    # Get list of media from the includes object
    media = {m["media_key"]: m for m in tweets.includes['media']}

    for tweet in tweets.data:
        attachments = tweet.data['attachments']
        media_keys = attachments['media_keys']
        if media[media_keys[0]].url:
            image_url = media[media_keys[0]].url
            print(image_url)
else:
    print('No images found for this search query')

Analyzing images with Amazon Rekognition

In order to analyze images obtained from the Twitter API v2, we need to import boto3 library and initialize the rekognition client. This will allow us to call the different methods available such as detect_text, detect_moderation_labels etc.



client = boto3.client('rekognition')

Next, we need to pass the rekognition client the image data. So we make a GET call to the Tweet image URL and obtain the response, and pass it to our methods as shown below:



def get_image(url):
    response = requests.get(url)
    return response.content

Text extraction

To extract text from images, we will call the detect_text method. If the response contains TextDetections, we will print those to the console.



def detect_text(photo):
    client = boto3.client('rekognition')
    response = client.detect_text(Image={'Bytes': photo})
    text_detections = response['TextDetections']
    for text in text_detections:
        print(text['DetectedText'])

In this example, if the image was:

The response from this method will look like:



Using the Twitter API
tpRespor
(0)
decY0c
v2 with GitHub Copilot
pEntd
Knull
ye Lom
Join us on twitch.tv/twitterdev
14th July 2022 @ 2PM ET / 11AM PT

Detecting objects and scenes

To detect labels from images, we will call the detect_labels method. If the response contains Labels, we will print those to the console.



def detect_labels(photo):
    client = boto3.client('rekognition')
    response = client.detect_labels(Image={'Bytes': photo})
    for label in response['Labels']:
        print("Label: " + label['Name'])

For each Tweet image, the output of this method will look something like this:



Label: Person
Label: Human
Label: Car
Label: Automobile
Label: Vehicle
Label: Transportation

Image moderation

You can do image moderation with Rekognition. To detect moderation lables for images, we will call the detect_moderation_labels method. If the response contains ModerationLabels, we will print those to the console.



def get_moderation_labels(photo):
    client = boto3.client('rekognition')
    response = client.detect_moderation_labels(Image={'Bytes': photo})
    for label in response['ModerationLabels']:
        print(label['Name'])

For each Tweet image, the output of this method will look something like this:



Suggestive
Revealing Clothes

Celebrity rekognition

You can also detect celebrities in images using Rekognition. To detect celebrities in images, we will call the recognize_celebrities method. If the response contains CelebrityFaces, we will print the names and IDs to the console.



def detect_celebrity(photo):
    client = boto3.client('rekognition')
    response = client.recognize_celebrities(Image={'Bytes': photo})
    for celebrity in response['CelebrityFaces']:
        print('Name: ' + celebrity['Name'])
        print('Id: ' + celebrity['Id'])

For each Tweet image, the output of this method will look something like this:



Name: Joe Biden
Id: 3Zw7fr

Putting it all together

Below is the complete code snippet that you can run in Python. The example below uses the detect_labels method but you can replace it with the other methods that we discussed above.



import boto3
import requests
import tweepy


def get_image(url):
    response = requests.get(url)
    return response.content

# Feel free to replace this method with any of the other methods
# shared above 
def detect_labels(photo):
    client = boto3.client('rekognition')
    response = client.detect_labels(Image={'Bytes': photo})
    for label in response['Labels']:
        print("Label: " + label['Name'])


def main():
    client = tweepy.Client('BEARER_TOKEN')

    # Replace with your own search query
    query = 'from:twitterdev -is:retweet has:images'

    tweets = client.search_recent_tweets(query=query,
                                         media_fields=['url'],
                                         expansions='attachments.media_keys',
                                         max_results=10)

    if 'media' in tweets.includes:
        # Get list of media from the includes object
        media = {m["media_key"]: m for m in tweets.includes['media']}

        for tweet in tweets.data:
            attachments = tweet.data['attachments']
            media_keys = attachments['media_keys']
            if media[media_keys[0]].url:
                image_url = media[media_keys[0]].url
                print(image_url)
                photo = get_image(image_url)
                print(detect_labels(photo))
    else:
        print('No images found for this search query')


if __name__ == "__main__":
    main()