DEV Community

Cover image for Using Python & Alteryx to Find Social Network Influencers in Your Neighborhood (Part I)
Ken Hudak
Ken Hudak

Posted on

Using Python & Alteryx to Find Social Network Influencers in Your Neighborhood (Part I)

After arriving in Canada (as a permanent resident, woot!) in 2018, the relevancy of many of the Twitter accounts I followed for current events in my old neighborhood kinda diminished. So I decided to set out and make a new list of influential users in my new neighborhood.

I bet there are easier ways to do this, but assuming I'd be biased and crazily start following all the local techorati and anyone mentioning Formula 1, I really needed a neutral tech solution. So a combo of Python and the Tweepy API would be a great impartial arbiter to nominate some influential local users by:

  • Wrangling a batch of tweets from the past couple of weeks from the Windsor area with Python, and
  • Use Alteryx to automate a workflow that scores and ranks local users based on # of favorites and retweets.

First I needed to look at tools available to locate tweeters by geographic location. Twitter has an amazing method to add latitude and longitude to a post. It would be super easy then to just make a geobounded box and grab all the tweets that came from within a box/city that I was interested in. BUT the specific geotagging thing got creepy and trackable and Twitter ultimately disabled it by default in 2014.

The feature is still there but now users need to opt-in and enable location. Only a few do.

Image of an outline of a cat around countries on a map

Finding users via hashtag (such as a nickname like #RoseCity or aiport code #YQG) is effective, but not very inclusive. So, that leaves the Twitter API place object and the ID field as a decent method to narrow down a users location. And please note that Twitter docs refer to place_id as "a place associated with the user account." Close enough.

I've been getting the place_id using Twitter advanced search and finding someone who has geocoding enabled, and then clicking on their City/Province to get this style of string: https://twitter.com/places/15b0a643e7bd22c7.

Look for this:
A screen shot of a tweet showing the city and state listed.

There's a better way to do this, if you know of it please share in the comments! (or if you know an alternative to Yahoo!'s now defunct Where On Earth ID (WOEID) API.)

Ok. I've got the place_id alphanumeric string that I want to search with, so now it's time for the Python.

But first! Just in case you are reading this from the far distant future or using an virtual environment with who-knows-what-versions of installed Python libraries, I'll share what I am using right now in the year 2020 when this all worked.

Spider IDE 3.3.6
Python 3.7.5
Tweepy 3.8.0
Twitter API 1.1 - watch for the new (now delayed) version in 2020!
Pandas 0.25.3

First, you'll need Twitter developer keys if you don't already have them. This can take a couple of days.

Next, let's load the two simple libraries we'll need, tweepy for the API and pandas for the dataframes and exporting.

# Import Tweepy and Pandas Libraries.
# Install first if necessary.
import tweepy as tw
import pandas as pd

Next, authenticate with Oath using your new keys.

# Use your own Twitter.com Developer Keys inside the double-quotes
consumer_key = "Tk9...."
consumer_secret = "AOH..."
access_token = "550..."
access_token_secret = "y0l..."

Now we'll let Tweepy pass the variables above into the statements below, and then form the string for the search API.

# Authenticate with Twitter - nothing to change here since it uses the variables above
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)


# Declare what you want search_words to look for. Here I'm using Twitter's PlaceID for Windsor
search_variable = "place:15b0a643e7bd22c7 since:2019-06-01"

The free standard search API can go back 7 days, so obviously the 2019-06-01 date I requested didn't mean anything in the end. I don't really know cursor methods so I just ran this script at 7 days intervals as a workaround. But the important part is inserting the place_id you found for the target location, the language, and how many items to return.

# Uncomment filter:retweets to filter out ReTweets. I am interested in RT's so will comment out
twitter_search = search_variable #+ " -filter:retweets"

# Collect tweets in tweets variable. Will attempt to collect no more than 5000 rows.
tweets = tw.Cursor(api.search,
              q=twitter_search,
              lang="en",).items(5000)

This next loop will go through the tweets that meet the place_id criteria, create a list of a list, and include the specified fields such as user id, date tweet was sent, if the user was verified or not, etc, and store all that info in variable tweet_data.

# Iterate and print the information you are looking for in each collected tweet
for tweet in tweets:
    tweet_data = [[tweet.place, tweet.retweet_count, tweet.favorite_count, tweet.coordinates, tweet.user.created_at, tweet.user.screen_name, tweet.user.location, tweet.user.statuses_count, tweet.user.followers_count, tweet.user.verified, tweet.created_at] for tweet in tweets]

We loaded Pandas for a reason earlier, and now I'm going to set up a dataframe into columns and give them nicer names.

# Put the collected data tweet_text into Pandas DF for writing to table
tweet_text = pd.DataFrame(data=tweet_data, 
                          columns=['Place', 'RetweetCount', '# Faves Count', 'Coordinates', 'Join Date', 'user', "location", 'Tweets', 'followers', 'verified', 'tweeteddate'])

And lastly, use Pandas to export the tweet_text variable straight to Excel. Here you can name the Excel file AND the worksheet. Note, if you run this several times to get around the 7 day limit, be sure to change the name so you don't overwrite the historical data (learn from my mistakes!).

# This is a nifty Pandas tool to write out to an Excel file with a custom sheet name
tweet_text.to_excel("WindsorTwitter1_2020-07-29.xlsx", sheet_name="WindsorTwitter") 

This first column in the dataframe, Place, returns this geographic info:

Place(_api=, id='15b0a643e7bd22c7', url='https://api.twitter.com/1.1/geo/id/15b0a643e7bd22c7.json', place_type='city', name='Windsor', full_name='Windsor, Ontario', country_code='CA', country='Canada', contained_within=[], bounding_box=BoundingBox(_api=, type='Polygon', coordinates=[[[-83.113623, 42.2339053], [-82.890548, 42.2339053], [-82.890548, 42.356225], [-83.113623, 42.356225]]]), attributes={})

And that's it. The complete code is on GitHub.
In the second part of this series I'll go through what I found in my Excel file and how it was cleaned up with Alteryx.

Top comments (0)