Jessica Garson for XDevelopers

Posted on Jun 22, 2020 • Edited on Jul 2, 2020

How to analyze the sentiment of your own Tweets

#twitter #azure #python #showdev

This tutorial was originally posted to the Twitter Developer Blog

Tweets combined with a sentiment score can give you a gauge of your Tweets in a quantitative way. To put some data behind the question of how you are feeling, you can use Python, Twitter’s Recent Search Endpoint to explore your Tweets from the past 7 days, and Microsoft Azure’s Text Analytics Cognitive Service to detect languages and determine sentiment scores. This tutorial will walk you through how you can create code that pulls your Tweets from the past 7 days and gives you a score to let you know exactly how your week has been. You can reference the full version of the code.

Setting up

Before you can get started you will need to make sure you have the following:

Python 3 installed.
Twitter Developer account: if you don’t have one already, you can apply for one.
A Twitter developer app, which can be created in your Twitter developer account.
A bearer token for your app
Enrollment in Twitter Developer Labs
Your app will also need to be enrolled in the preview for Recent Search.
An account with Microsoft Azure’s Text Analytics Cognitive Service and an endpoint created. You can check out Microsoft’s quick start guide on how to call the Text Analytics API.

You will need to create a dictionary for this project so in your terminal you can type the following, which will create a new directory for you and change from the directory you currently are to the new one you just created. You’ll also create a new Python file and a YAML configuration file that will be used to store your tokens and secrets.

mkdir how-positive-was-your-week
cd how-positive-was-your-week
touch week.py
touch config.yaml

Using the text editor of your choosing, you can now set up your configuration file config.yaml. You will want to set this up to replace x’s with your own bearer token and subscription key.

search_tweets_api:
  bearer_token: xxxxxxxxxxxxxxxxxxxxxxx
azure:
  subscription_key: xxxxxxxxxxxxxxxxxxxxxxx

You will also need to install the libraries Requests, PyYaML, and Pandas. Requests will be used to make HTTP requests to the Twitter and Azure endpoints and pandas which is used to shape the data. PyYaML allows you to parse the .yaml file where you will be storing your keys and tokens. Pandas will be used to manipulate and shape the data.

Open the file week.py and import all the libraries you’ll use. In addition to Requests and Pandas, you will want to import the packages for json and ast, which are part of the standard library of Python, so you don’t need to install them ahead of time. Additionally, you will import pandas using the alias pd so you don’t have to type the full word each time you want to call the library.

import requests
import pandas as pd
import json
import ast
import yaml

Creating the URL

Before you can connect the Twitter API, you’ll need to set up the URL to ensure it has the right fields so you get the right data back. You’ll first need to create a function called create_twitter_url in this function you’ll declare a variable for your handle, you can replace jessicagarson with your own handle. The max_results can be anywhere from 1 to 100. If you are using a handle that would have more than 100 Tweets in a given week you may want to build in some logic to handle pagination or use a library such as searchtweets-labs. The URL will need to be formatted to contain the max number of results and the query to say that you are looking for Tweets from a specific handle. You’ll return the formatted URL in a variable called url, since you will need it to make a get GET request later.

def create_twitter_url():
    handle = "jessicagarson"
    max_results = 100
    mrf = "max_results={}".format(max_results)
    q = "query=from:{}".format(handle)
    url = "https://api.twitter.com/labs/2/tweets/search?tweet.fields=lang&{}&{}".format(
        mrf, q
    )
    return url

The URL you are creating is:
https://api.twitter.com/labs/2/tweets/search?max_results=100&query=from:jessicagarson

You can adjust your query if you wanted to exclude retweets or Tweets that contain media. You can make adjustments to the data that is returned by the Twitter API by adding additional fields and expansions to your query. Using a REST client such as Postman or Insomnia can be helpful for seeing what data you get back and making adjustments before you start writing code. There is a Postman collection for Labs endpoints as well.

Setting up your main function

At the bottom of the file, you can start to set up the main function that you will use to call all of the functions that you create. You can add the function you just created and call the function using an if __name__ == "__main__" statement.

def main():
    url = create_twitter_url()


if __name__ == "__main__":
    main()

Authenticating and Connecting to the Twitter API

To access the configuration file you created while setting up config.yaml, you can define a function called process_yaml which will read in the YAML file and save the contents.

def process_yaml():
    with open("config.yaml") as file:
        return yaml.safe_load(file)

In your main function, you can save this to a variable named data. Your main function should now have two variables one for url and one for data.

def main():
    url = create_twitter_url()
    data = process_yaml()

To access the bearer token from your config.yaml file you can use the following function.

def create_bearer_token(data):
    return data["search_tweets_api"]["bearer_token"]

Just as you did earlier, you can add a variable called bearer_token to your main function.

def main():
    url = create_twitter_url()
    data = process_yaml()
    bearer_token = create_bearer_token(data)

To connect to the Twitter API, you’ll create a function called twitter_auth_and_connect where you’ll format the headers to pass in your bearer_token and url. At this point, this is where you connect to the Twitter API by using the request package to make a GET request.

def twitter_auth_and_connect(bearer_token, url):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    response = requests.request("GET", url, headers=headers)
    return response.json()

The object you are returning, in this function, is a payload that looks like this:

{'data': [{'id': '1272881032308629506', 'text': '@nomadaisy @kndl I just want to do deals with you'}, {'id': '1272880943687258112', 'text': '@nomadaisy @kndl I live too far away to hang responsibly with y’all 😬😭'}, {'id': '1272711045606408192', 'text': '@Babycastles https://t.co/Yfj8SJAnpG'}, {'id': '1272390182231330816', 'text': '@replylord Haha, I broke a glass in your honor today and all so I think I do read your Tweets'}, {'id': '1271810907274915840', 'text': '@replylord I like that I’m the only like here.'}, {'id': '1271435152183476225', 'text': '@Arfness @ChicagoPython @codewithbri @WeCodeDreams @agfors The video seems to be available https://t.co/GojUGdulkP'}, {'id': '1271111488024064001', 'text': 'RT @TwitterDev: Tune in tonight and watch as @jessicagarson takes us through running your favorite Python package in R. 🍿\n\nLearn how to use…'}, {'id': '1270794941892046848', 'text': 'RT @ChicagoPython: Chicago Python will be live-streaming tmrw night!\n\nOur talks:\n- How to run your favorite Python package in R by @jessica…'}, {'id': '1270485552488427521', 'text': "Speaking virtually at @ChicagoPython's __main__ meeting on Thursday night. I'll be showing how to run your favorite Python package in R. https://t.co/TnqgO80I3t"}], 'meta': {'newest_id': '1272881032308629506', 'oldest_id': '1270485552488427521', 'result_count': 9}}

You can now update your main function so it looks as follows:

def main():
    url = create_twitter_url()
    data = process_yaml()
    bearer_token = create_bearer_token(data)
    res_json = twitter_auth_and_connect(bearer_token, url)

Generating languages

While it is possible to get languages from the payload using the Recent Search payload, there is a version of this code that uses this method, Azure also offers an endpoint that will estimate the language for you. Before you can use it, you will need to ensure your data is in the right shape to connect to the detect languages endpoint. Therefore, you’ll format the data to match the format outlined in Azure’s quick start guide. In order to do so, you will need to separate the object inside of the called data which contains the Tweets and ids into a variable called data_only. You will need to do some string formatting to get the Tweet data into the right format and other formatting needed to convert the string into a dictionary. You can use the json and ast libraries to assist in this conversion.

def lang_data_shape(res_json):
    data_only = res_json["data"]
    doc_start = '"documents": {}'.format(data_only)
    str_json = "{" + doc_start + "}"
    dump_doc = json.dumps(str_json)
    doc = json.loads(dump_doc)
    return ast.literal_eval(doc)

To connect to Azure, you will need to format your data, by adjusting the URL, in a similar way to how you did with the Twitter API URL. You can set up your URLs for retrieving data from both the languages and sentiment endpoints. Your credentials will be parsed from your config.yaml and passed in to authenticate to the Azure endpoints.

def connect_to_azure(data):
    azure_url = "https://week.cognitiveservices.azure.com/"
    language_api_url = "{}text/analytics/v2.1/languages".format(azure_url)
    sentiment_url = "{}text/analytics/v2.1/sentiment".format(azure_url)
    subscription_key = data["azure"]["subscription_key"]
    return language_api_url, sentiment_url, subscription_key

Additionally, you will create a function, to create the header, for connecting to Azure by passing in your subscription key into the format needed to make your request.

def azure_header(subscription_key):
    return {"Ocp-Apim-Subscription-Key": subscription_key}

At this point, you are now ready to make a POST request to the Azure API to generate languages for your Tweets.

def generate_languages(headers, language_api_url, documents):
    response = requests.post(language_api_url, headers=headers, json=documents)
    return response.json()

You should get back a JSON response that looks similar to the response below.

{'documents': [{'id': '1272881032308629506', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1272880943687258112', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1272711045606408192', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1272390182231330816', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1271810907274915840', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1271435152183476225', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1271111488024064001', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1270794941892046848', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1270485552488427521', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}], 'errors': []}

You will also want to update your main function to include the new functions you created. It should now look similar to this.

def main():
    url = create_twitter_url()
    data = process_yaml()
    bearer_token = create_bearer_token(data)
    res_json = twitter_auth_and_connect(bearer_token, url)
    documents = lang_data_shape(res_json)
    language_api_url, sentiment_url, subscription_key = connect_to_azure(data)
    headers = azure_header(subscription_key)
    with_languages = generate_languages(headers, language_api_url, documents)

Obtaining sentiment scores

Before you can use Azure’s endpoint for generating sentiment scores, you will need to combine the Tweet data with the data that contains the generated languages. You can use pandas to assist in this data conversion process. You can convert the json object with detected languages into a data frame. Since you only want the abbreviations of the language you can do a list comprehension to get the iso6391Name which contains abbreviations of languages. The iso6391Name is contained inside of a dictionary, which is inside of a list and the list is inside of the data frame with language data. You can also turn the Tweet data into a data frame and attach the abbreviation for the languages of your Tweets to that same data frame. From there, you can send that Tweet data into a JSON format.

def combine_lang_data(documents, with_languages):
    langs = pd.DataFrame(with_languages["documents"])
    lang_iso = [x.get("iso6391Name")
                for d in langs.detectedLanguages if d for x in d]
    data_only = documents["documents"]
    tweet_data = pd.DataFrame(data_only)
    tweet_data.insert(2, "language", lang_iso, True)
    json_lines = tweet_data.to_json(orient="records")
    return json_lines

Similarly to how you get the data into a dictionary format with the word documents: as the key in front of your payload to obtain the sentiment scores.

def add_document_format(json_lines):
    docu_format = '"' + "documents" + '"'
    json_docu_format = "{}:{}".format(docu_format, json_lines)
    docu_align = "{" + json_docu_format + "}"
    jd_align = json.dumps(docu_align)
    jl_align = json.loads(jd_align)
    return ast.literal_eval(jl_align)

Now, your data should be in the right format to call Azure’s sentiment endpoint. You can make a POST request to the sentiment endpoint you defined in the connect_to_azure function.

def sentiment_scores(headers, sentiment_url, document_format):
    response = requests.post(
        sentiment_url, headers=headers, json=document_format)
    return response.json()

The JSON response you will get returned should look similar to the payload below.

{'documents': [{'id': '1272881032308629506', 'score': 0.18426942825317383}, {'id': '1272880943687258112', 'score': 0.0031259357929229736}, {'id': '1272711045606408192', 'score': 0.7015109062194824}, {'id': '1272390182231330816', 'score': 0.8754926323890686}, {'id': '1271810907274915840', 'score': 0.19140595197677612}, {'id': '1271435152183476225', 'score': 0.7853382229804993}, {'id': '1271111488024064001', 'score': 0.7884223461151123}, {'id': '1270794941892046848', 'score': 0.8826596736907959}, {'id': '1270485552488427521', 'score': 0.8784275054931641}], 'errors': []}

Your main function should now look similar to the following.

def main():
    url = create_twitter_url()
    data = process_yaml()
    bearer_token = create_bearer_token(data)
    res_json = twitter_auth_and_connect(bearer_token, url)
    documents = lang_data_shape(res_json)
    language_api_url, sentiment_url, subscription_key = connect_to_azure(data)
    headers = azure_header(subscription_key)
    with_languages = generate_languages(headers, language_api_url, documents)
    json_lines = combine_lang_data(documents, with_languages)
    document_format = add_document_format(json_lines)
    sentiments = sentiment_scores(headers, sentiment_url, document_format)

Getting the average sentiment score

To get the average sentiment score we can turn our JSON response from the Azure sentiment endpoint into a data frame and calculate the mean of the column entitled score.

def mean_score(sentiments):
    sentiment_df = pd.DataFrame(sentiments["documents"])
    return sentiment_df["score"].mean()

After you have the average score you can create a logic statement to let you know exactly how positive your week was.

def week_logic(week_score):
    if week_score > 0.75 or week_score == 0.75:
        print("You had a positive week")
    elif week_score > 0.45 or week_score == 0.45:
        print("You had a neutral week")
    else:
        print("You had a negative week, I hope it gets better")

The final version of the main statement for your file should look like this:

def main():
    url = create_twitter_url()
    data = process_yaml()
    bearer_token = create_bearer_token(data)
    res_json = twitter_auth_and_connect(bearer_token, url)
    documents = lang_data_shape(res_json)
    language_api_url, sentiment_url, subscription_key = connect_to_azure(data)
    headers = azure_header(subscription_key)
    with_languages = generate_languages(headers, language_api_url, documents)
    json_lines = combine_lang_data(documents, with_languages)
    document_format = add_document_format(json_lines)
    sentiments = sentiment_scores(headers, sentiment_url, document_format)
    week_score = mean_score(sentiments)
    print(week_score)
    week_logic(week_score)

You should now be able to run your code by typing the following into your terminal:

python3 week.py

Depending on your sentiment score you should see something that looks similar to this in your terminal output:

0.6470708809792995
You had a neutral week

Next steps

This code sample could be easily extended to let you know which Tweets were the most positive, or the most negative, or to track changes week by week with a visualization.

Let us know on the forums if you run into any troubles along the way or Tweet us at @TwitterDev if this inspires you to create anything. I used several libraries and tools beyond the Twitter API to make this tutorial, but you may have different needs and requirements and should evaluate whether those tools are right for you. Twitter does not operate or manage the third party services mentioned above and those services may have separate terms regarding use of any tools and other features.

Top comments (2)

RobbieGM • Jun 23 '20

I made something similar a while ago, but using a dumber algorithm. Currently it's hosted at bitterbird.herokuapp.com/ if anyone wants to see how a project like this could be integrated with a frontend.

Jessica Garson • Jun 23 '20 • Edited

Awesome! Thanks for sharing!

Some comments have been hidden by the post's author - find out more