Exactly, I think about the same. Or for having good impact in your products ( social media) if you're trying to sell stuff. There are many possibilities.
A project in mind is to just work in this kind of analysis for suicide prevention. π
Nice, very interesting! He seems to tweet surprisingly a high count of positive tweets (51%). But how much of this tweets are fake news and lies is another question... nytimes.com/interactive/2017/06/23...
Yeah, that resulted surprising to me! I've heard that he's not the only one tweeting from his account, but he has a team for this. That might be a possible reason. That's why it results interesting to analyze the polarity of tweets that come from different sources.
Well technically these sentiment calculations should be taken with a grain of salt. you use VaderSentiment library as well and compare both values of sentiments to get better insight.
Es debido a que no has definido tu funciΓ³n twitter_setup().
AsegΓΊrate de en Spyder (especΓficamente en tu cΓ³digo) tener definido lo siguiente:
# We import our access keys:fromcredentialsimport*# This will allow us to use the keys as variables# API's setup:deftwitter_setup():"""
Utility function to setup the Twitter's API
with our access keys provided.
"""# Authentication and access using keys:auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)auth.set_access_token(ACCESS_TOKEN,ACCESS_SECRET)# Return API with authentication:api=tweepy.API(auth)returnapi
Otra recomendaciΓ³n es que intentes utilizar los Jupyter notebooks. :)
MuchΓsimas gracias Rodolfo, de verdad es que impresionas!!!
Ya resolvΓ este tema... a ver si no me da mΓ‘s cosas y puedo ver lo que sale por mi cuenta!
Abrazos
New to Python, wondering how to retrieve more than the default 15 tweets from this code? I looked up a few solutions elsewhere but couldn't figure out how to integrate. Suggestions?
After graduating from The George Washington University with a Bachelor of Business Administration in International Business, I began my career as an asset management intern for the Real Estate Fina...
Hi there, I was having some trouble with the "visualizing the statistics" section as detailed in sections 2.1 and 2.2; if you take a look at my GitHub repo, you'll notice I had to comment out # %matplotlib inline and replaced requirement with plt.ion() within the script-running file (trumpet.py) in order to run the scripts without failure (e.g. python3 trumpet.py). Can you please explain how to generate the visualizations as detailed in those sections? For some reason, I'm unable to render those visual within my Jupyter Notebook-env/config. I'm only 10 days new to Python, so I'd appreciate any guidance. Great tutorial-
thanks!
Instead of adding plt.ion() at the beginning, you can add the following code each time you're generating a plot, in order to visualize it: plt.show(). This will open an external window and display the immediately last plot generated.
After graduating from The George Washington University with a Bachelor of Business Administration in International Business, I began my career as an asset management intern for the Real Estate Fina...
Would it be possible to check / detect how many likes comes from the staff of a VIP ? It is said that many politicals manage likes and retweets by asking their support to like and retweet their messages? (not sure to be clear) Through 200 tweets, this would be possible to look at the twitter accounts that like systematically and quickly (as soon as published, like bots do) then substract (or minimize) them from the final evaluation.
If you want to count something like this in real time, you would need to modify the way you're consuming the API (rest) and create a listener (you can still do that with Tweepy). That's what I would do, I'd create a specific listener for Trump's tweets and use threads to count for certain time likes and retweets for a new tweet.
Does this answer help? I can try to be more explicit. :)
Thank you for your tutorial! Its was easy to follow and everything work on my first attempt!
I do not want to reload all the tweets from the web, while I am developing. I altered the first few lines, to cache the tweets locally.
save = "saved.pickle"
if os.path.exists(os.path.join(os.path.dirname(__file__), save)):
with open(save, 'rb') as f:
tweets = pickle.load(f)
else:
extractor = twitter_setup()
tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)
with open(save, 'wb') as f:
pickle.dump(tweets, f)
What I did at the end (in my personal case) was to save the tweet list as a csv file (data.to_csv(...)), taking as an advantage that I already had all the info in a pandas dataframe. :)
I was looking for a tutorial to recommend to an acquaintance who is moving into digital journalism, and I came across your post. It is very well-written. Thanks for sharing!
This is just a short remark, since you seem to be using Pandas, but not to its fullest potential.
When you observe a possible relationship between RTs and Likes in subsection 2.1, you can quantify this by computing the (Pearson) correlation
data['RTs'].corr(data['Likes'])
(It is close to 0.7.)
When finding the sources of tweets in subsection 2.3, instead of using loops,
sources = data['Source'].unique()
and then, when computing percentages,
data['Source'].value_counts()
You can put the latter in a data frame... In any case, thanks again!
I must say that it was for an introductory workshop and I finished all the material during dawn three days before or something. :P
It might be possible that most of the last part is not optimized in code. :(
Thanks for your observations! :D
They simplify the data handling using the potential of Pandas. :)
Hye mate thanks for this tutorial. It seems to be working fine with any hash tag, Except #LetsTaxThis . Do you mind to have a look and update , will be very helpful.
Basically I want to extract data from Twitter using #LetsTaxThis Hashtag.
Nicely done. I had installed Anaconda before but didn't really get past Hello World in the Jupyter notebook. This was an excellent idea to get people like me off their proverbial rear-end and use it for a very fun idea! I was able to follow it right through and get everything to work after dusting off the cobwebs of my Anaconda environment.
Hi Rodolfo, Thanks a lot for a very comprehensive tutorial. However, I still could not get rid of the credentials import problem
ModuleNotFoundError: No module named 'credentials'
I saw in the discussion that you have mentioned a solution but I am very new to Pytho. So I still could not figuer out the solution. Can you please discribe how the file credentials.py should look like (offcourse leaving the blank space where I can put my own credentials)? Thanks a lot.
Consume as a Rest API. In that case, the deployment in Heroku (or any other deployment service) would have to process the new tweets and add the new data to the previous.
Create a stream listener to continuously detect a new tweet and process it.
In 1., the simplest way would be only to schedule a task (a simple script) to be executed on certain time (pythonanywhere also works for this, I have a twitter bot that runs every 24 hours). Anyway one can create a service using Tweepy, in fact there's a Flask-Tweepy integration: flask-tweepy.readthedocs.io/en/lat...
i am running this script in pycharm ,everything is working fine but it is not showing the sentimental anylsis part,no error no output,below is the last part of output not showing the sentimental anaylysis,any body can help me.
ImportError Traceback (most recent call last)
in ()
1 # We import our access keys:
----> 2 from credentials import * # This will allow us to use the keys as variables
3
4 # API's setup:
5 def twitter_setup():
How do we search by hashtag and not by username:
tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)
I want to extract data which contains say #FIFA what are the changes I need to make?
Thanks for the awesome tutorial! I'm new to python and had a quick question though. You mentioned that textblob provides a trained analyzer, and you use that in your tutorial to assess the polarity of Trump's tweets. Can you tell me where I can access the list of words that's associated with positive/negative/neutral? I've been looking on textblob documentation but haven't found it yet.
The whole article is great, but is not really precise. And here is the reason for that: These twits are made by two personalities, one from Android, one from iPhone. Here is the details: varianceexplained.org/r/trump-foll...
But anyway, really great work.
Thanks for sharing!
One of my ideas about this post is to give tools to implement solutions on different areas. As you say, this could help in healthcare analysis. For that you might need a specific classifier (not the texblob's default I used), and you can learn how to build one in the last reference I provide in the post.
If you begin working on that, please let us know if there's a thing on which we may help.
plz how can i extract arabic data i try this code but no result
import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ModuleNotFoundError Traceback (most recent call last)
in ()
1 # We import our access keys:
----> 2 from credentials import * # This will allow us to use the keys as variables
3
4 # API's setup:
5 def twitter_setup():
ModuleNotFoundError: No module named 'credentials'
Sentiment analysis is not working fine.
"@nick_carter Aww Nick!! I like your hair longer, why did you cut it off? Break. My. Heart."
giving me the positive polarity..
Me ha gustado mucho tu ejemplo.
Espero probarlo con mΓ‘s detalle para ver si es viable tomarlo como base para el anΓ‘lisis de tweets que tengo de EspaΓ±a
A software developer, mainly worked on OCR based application development, developed many Tk based GUI front end tools, started learning Python, and developed many bots.
Sure, this can be an automatized process in which you can create your own methods with the whole process and then create a users list to compute the polarity for each one (using a loop).
I can't have an access to API since 3 weeks ago. They didn't gave me un authorisation so what can I do? I should start my research about covid 19 using syntement analysis approach
DevOps - Focuses on teaching, implementing CI/CD & Kubernetes. Python enthusiast. Assists organizations in streamlining processes. Connect with me today to chat about all the crazy Ops things.
think you, Rodolfo Ferro,
I am the beginner with python and with twitter analysis.
I have a question, what is the changes in your code to deal with other languages like Arabic.
Hii.. I get the following error on print Max favorites: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 115: ordinal not in range(128)
Fascinating. I wouldn't be surprised if this kind of research goes mainstream in the future in journalism.
Exactly, I think about the same. Or for having good impact in your products ( social media) if you're trying to sell stuff. There are many possibilities.
A project in mind is to just work in this kind of analysis for suicide prevention. π
Nice, very interesting! He seems to tweet surprisingly a high count of positive tweets (51%). But how much of this tweets are fake news and lies is another question... nytimes.com/interactive/2017/06/23...
Yeah, that resulted surprising to me! I've heard that he's not the only one tweeting from his account, but he has a team for this. That might be a possible reason. That's why it results interesting to analyze the polarity of tweets that come from different sources.
Well technically these sentiment calculations should be taken with a grain of salt. you use VaderSentiment library as well and compare both values of sentiments to get better insight.
Awesome tutorial!!
Thank you so much! πππΌ
he estado intentando correr el script pero tengo varios problemas que quizΓ‘s puedas ayudarme.
Soy nuevo en Python pero me encantarΓa adaptar este ejemplo a otros usuarios si lograse hacer funcionar.
Tengo Python 3.6.3 y trabajo con Spyder, he copiado tu ejemplo pero el script se queda en la lΓΊnea 37:
We create an extractor object:
extractor = twitter_setup()
Cuando aparece este error:
NameError: name 'twitter_setup' is not defined
A que se debe esto?
Gracias por tu orientaciΓ³n!
Es debido a que no has definido tu funciΓ³n
twitter_setup()
.AsegΓΊrate de en Spyder (especΓficamente en tu cΓ³digo) tener definido lo siguiente:
Otra recomendaciΓ³n es que intentes utilizar los Jupyter notebooks. :)
MuchΓsimas gracias Rodolfo, de verdad es que impresionas!!!
Ya resolvΓ este tema... a ver si no me da mΓ‘s cosas y puedo ver lo que sale por mi cuenta!
Abrazos
How did did you write code in your comment with syntax highlighting?
Hola Rodolfo!
Te tengo una pregunta, ΒΏes posible modificar la funciΓ³n de limpieza de Tweets para que no elimine los acentos de las palabras en espaΓ±ol?
Claro, en realidad sΓ³lo serΓa modificar tu regla de limpieza en
re.sub()
. :)Hi Rodolfo, great article!
New to Python, wondering how to retrieve more than the default 15 tweets from this code? I looked up a few solutions elsewhere but couldn't figure out how to integrate. Suggestions?
Thanks again! - Rich
We're actually retrieving the first 200 tweets, this is specified in the
count
parameter:The API allows us to retrieve at most 200 tweets per call.
Just went through the post again and found:
We create a tweet list as follows:
tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)
print("Number of tweets extracted: {}.\n".format(len(tweets)))
We print the most recent 5 tweets:
print("5 recent tweets:\n")
for tweet in tweets[:5]:
print(tweet.text)
print()
Which I've replaced with:
tweets = extractor.search(input("Topic you want to analyze: "))
Perhaps I need to play with this, if I can't figure it out, I'll re-ask. Lol my apologies!
Fixed w/ very simple:
tweets = extractor.search(input("Topic you want to analyze: "), count=200)
Thanks!
Although, even with count=200, this retrieves 100 tweets only.
Is there a way to refresh and retrieve more?
Thanks again!
@deuxperspective You can use a tool I built, hosted on RapidAPI
rapidapi.com/microworlds/api/twitt...
Hi there, I was having some trouble with the "visualizing the statistics" section as detailed in sections 2.1 and 2.2; if you take a look at my GitHub repo, you'll notice I had to comment out #
%matplotlib inline
and replaced requirement withplt.ion()
within the script-running file (trumpet.py) in order to run the scripts without failure (e.g.python3 trumpet.py
). Can you please explain how to generate the visualizations as detailed in those sections? For some reason, I'm unable to render those visual within my Jupyter Notebook-env/config. I'm only 10 days new to Python, so I'd appreciate any guidance. Great tutorial-thanks!
Sure! It's quite easy actually. :)
Instead of adding
plt.ion()
at the beginning, you can add the following code each time you're generating a plot, in order to visualize it:plt.show()
. This will open an external window and display the immediately last plot generated.You can see this in the Official Pyplot tutorial I shared at the end (References).
Please let me know I you have any other problem. :)
Got it, Rodolfo! Thank you for the guidance- tremendous fun! ;)
Would it be possible to check / detect how many likes comes from the staff of a VIP ? It is said that many politicals manage likes and retweets by asking their support to like and retweet their messages? (not sure to be clear) Through 200 tweets, this would be possible to look at the twitter accounts that like systematically and quickly (as soon as published, like bots do) then substract (or minimize) them from the final evaluation.
This is an interesting question.
If you want to count something like this in real time, you would need to modify the way you're consuming the API (rest) and create a listener (you can still do that with Tweepy). That's what I would do, I'd create a specific listener for Trump's tweets and use threads to count for certain time likes and retweets for a new tweet.
Does this answer help? I can try to be more explicit. :)
Yes I understand the idea. This would be a very useful tool to track false popular account.
This might help: github.com/RodolfoFerro/TwitterBot...
You can find more info in the documentation: tweepy.readthedocs.io/en/v3.5.0/st...
Hope this complements my previous answer! ππΌ
Thank you for your tutorial! Its was easy to follow and everything work on my first attempt!
I do not want to reload all the tweets from the web, while I am developing. I altered the first few lines, to cache the tweets locally.
Excellent idea!
What I did at the end (in my personal case) was to save the tweet list as a csv file (
data.to_csv(...)
), taking as an advantage that I already had all the info in a pandas dataframe. :)Thanks for your great comment!
I was looking for a tutorial to recommend to an acquaintance who is moving into digital journalism, and I came across your post. It is very well-written. Thanks for sharing!
This is just a short remark, since you seem to be using Pandas, but not to its fullest potential.
When you observe a possible relationship between RTs and Likes in subsection 2.1, you can quantify this by computing the (Pearson) correlation
data['RTs'].corr(data['Likes'])
(It is close to 0.7.)
When finding the sources of tweets in subsection 2.3, instead of using loops,
sources = data['Source'].unique()
and then, when computing percentages,
data['Source'].value_counts()
You can put the latter in a data frame... In any case, thanks again!
I must say that it was for an introductory workshop and I finished all the material during dawn three days before or something. :P
It might be possible that most of the last part is not optimized in code. :(
Thanks for your observations! :D
They simplify the data handling using the potential of Pandas. :)
for source in data['Source']:
for index in range(len(sources)):
if source == sources[index]:
percent[index] += 1
pass
(
Why did the author write 'pass' on the last line?
Sorry, my bad.
When I was writing the code I created an empty conditional, so at the beginning I put the
pass
reserved word and after that I forgot to take it out.Hye mate thanks for this tutorial. It seems to be working fine with any hash tag, Except #LetsTaxThis . Do you mind to have a look and update , will be very helpful.
Basically I want to extract data from Twitter using #LetsTaxThis Hashtag.
Thanks in advanced :) :)
Nicely done. I had installed Anaconda before but didn't really get past Hello World in the Jupyter notebook. This was an excellent idea to get people like me off their proverbial rear-end and use it for a very fun idea! I was able to follow it right through and get everything to work after dusting off the cobwebs of my Anaconda environment.
Thanks for sharing!
Thank you so much! I really appreciate it.
I'll try to keep posting stuff like this, I enjoy doing applied things with Python. :)
Hi Rodolfo, Thanks a lot for a very comprehensive tutorial. However, I still could not get rid of the credentials import problem
ModuleNotFoundError: No module named 'credentials'
I saw in the discussion that you have mentioned a solution but I am very new to Pytho. So I still could not figuer out the solution. Can you please discribe how the file credentials.py should look like (offcourse leaving the blank space where I can put my own credentials)? Thanks a lot.
Hi Rodolfo, I figuered out the solution and your code worked like a charm. Its awesome.
Hello,
I tried extracting historical tweets. Could you please provide any suggestion on this question:
stackoverflow.com/questions/491197...
@anubhav0fnu You can use this API (link below) to extract historical tweets
rapidapi.com/microworlds/api/twitt...
How do I take this to Cloud? flask + Heroku ??. Thanx in ADVANCE !!
Sorry for taking so long.
There are mainly two approaches:
In 1., the simplest way would be only to schedule a task (a simple script) to be executed on certain time (pythonanywhere also works for this, I have a twitter bot that runs every 24 hours). Anyway one can create a service using Tweepy, in fact there's a Flask-Tweepy integration: flask-tweepy.readthedocs.io/en/lat...
Thank you.
i am running this script in pycharm ,everything is working fine but it is not showing the sentimental anylsis part,no error no output,below is the last part of output not showing the sentimental anaylysis,any body can help me.
Number of retweets: 63927
139 characters.
Creation of content sources:
Process finished with exit code 0
when I run the code, see that
ImportError Traceback (most recent call last)
in ()
1 # We import our access keys:
----> 2 from credentials import * # This will allow us to use the keys as variables
3
4 # API's setup:
5 def twitter_setup():
ImportError: No module named credentials
can you help me to know what is the problem?
You need to create a
credentials.py
file with your tokens.All description is in the post.
how to calculate the F1, precision, recall and accuracy using textblob ? Any help? Thanks
Textblob has it's own method to compute accuracy:
textblob.readthedocs.io/en/dev/cla...
But you can also use scikit-learn's metrics:
scikit-learn.org/stable/modules/cl...
thank you
Hi! Thanks for the tutorial.
I noticed that tweets containing RTs are not printed in full. How do I get the full RT text?
I am able to un-trunctate a tweet using this:
if tweet['truncated']:
tweet_text = tweet['extended_tweet']['full_text']
else:
tweet_text = tweet['text']
but it won't work for tweets containing RTs.
Anyone know how I can get the full RTs?
I would need it to get an accurate sentiment analysis.
Many thanks for the help!
Hi!
One possible approach would be adding the
tweet_mode
parameter as follows:Let me know if that does the trick. :)
How do we search by hashtag and not by username:
tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)
I want to extract data which contains say #FIFA what are the changes I need to make?
This will help:
docs.tweepy.org/en/v3.5.0/api.html...
And the code might look like this:
Thanks for the awesome tutorial! I'm new to python and had a quick question though. You mentioned that textblob provides a trained analyzer, and you use that in your tutorial to assess the polarity of Trump's tweets. Can you tell me where I can access the list of words that's associated with positive/negative/neutral? I've been looking on textblob documentation but haven't found it yet.
I think that they may not be in the library, since it only has a pre-trained model.
Anyway, you can train your own. This can be useful resources to do so:
The whole article is great, but is not really precise. And here is the reason for that: These twits are made by two personalities, one from Android, one from iPhone. Here is the details: varianceexplained.org/r/trump-foll...
But anyway, really great work.
Thanks for sharing!
One of my ideas about this post is to give tools to implement solutions on different areas. As you say, this could help in healthcare analysis. For that you might need a specific classifier (not the texblob's default I used), and you can learn how to build one in the last reference I provide in the post.
If you begin working on that, please let us know if there's a thing on which we may help.
Best!
plz how can i extract arabic data i try this code but no result
import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
class Listener(StreamListener):
def on_data(self, data):
file = open("twee.txt", "a")
file.write(data + "\n")
file.close()
print("Record saved")
auth = OAuthHandler("", "")
auth.set_access_token("","")
tweets = tweepy.Cursor(tweepy.api.search, lang= "Ar").items()
Stream = Stream(auth, Listener())
Stream.filter(track = ["informatique"])
why am i getting this error:
ModuleNotFoundError Traceback (most recent call last)
in ()
1 # We import our access keys:
----> 2 from credentials import * # This will allow us to use the keys as variables
3
4 # API's setup:
5 def twitter_setup():
ModuleNotFoundError: No module named 'credentials'
I think that this will solve your error:
credentials.py
that has to contain your Twitter App credentials.Please let me know if not. :+1:
thank you very much. i solved that
Is all this code placed in one document? Or is everything separate? Sorry Iβm new to this, but need to do it as part of my project.
Thanks.
You can use one Jupyter Notebook, this might be useful: github.com/RodolfoFerro/pandas_twi...
Sentiment analysis is not working fine.
"@nick_carter Aww Nick!! I like your hair longer, why did you cut it off? Break. My. Heart."
giving me the positive polarity..
As I mentioned, this is a pre-trained model but you can train your own model with your own feed, so it can be more accurate. :)
This can be useful resources:
Excelente trabajo Rodolfo para NLP. Saludos un abrazo
MuchΓsimas gracias. Como mencionaba en el post, en mi Github puede encontrarse el notebook con el contenido en espaΓ±ol (por cualquier cosa).
Β‘Saludos!
Where the tutorial?
First of all , Thank you vary much for such an interesting post.
I am getting very less tweets with the given API. How I can get more tweets?
By asking trump to tweet more
Me ha gustado mucho tu ejemplo.
Espero probarlo con mΓ‘s detalle para ver si es viable tomarlo como base para el anΓ‘lisis de tweets que tengo de EspaΓ±a
Excellent, Superb man you are! Executed your code., got the results as it is.
Thanks!
I'm glad you enjoyed it. :)
Awesome tutorial. Can you please tell me,how can i retrieve today's tweets.
This is a fantastic and comprehensive tutorial! Is it possible to get information about a large list of users in one go? (For marketing e.g.)
Sorry for the delay!
Sure, this can be an automatized process in which you can create your own methods with the whole process and then create a users list to compute the polarity for each one (using a loop).
nice
Thank you!
where is the tutorial?
Wow, I've just realized that it has been deleted. π₯
Anyway, you can find it in here: rodolfoferro.xyz/sentiment-analysi...
I can't have an access to API since 3 weeks ago. They didn't gave me un authorisation so what can I do? I should start my research about covid 19 using syntement analysis approach
Well dear Rodolfo.... What to say! You are my hero of the month! :)
Thank you so much for your efforts and your knowledge to share with us!
Thank you so much!
It is very nice to read this, I'll try give myself more time in order to post more things. :)
think you, Rodolfo Ferro,
I am the beginner with python and with twitter analysis.
I have a question, what is the changes in your code to deal with other languages like Arabic.
You're welcome!
You might need to translate the language of the text, TextBlob also can deal with that: textblob.readthedocs.io/en/dev/qui...
Awesome, really really awesome
Thank you!
Hii.. I get the following error on print Max favorites: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 115: ordinal not in range(128)
Please help me resolve it
Where is the code for this located? Am I missing something?
how i can stream tweet arabic(Ψ±ΩΨ§ΨΆΩ) what the change
Where is the tutorial???
Great article. You can also use this tool (link below) to search for live or historical tweets
rapidapi.com/microworlds/api/twitt...
Ahhh nice, thanks a lot
how to search for a particular user ?