For further actions, you may consider blocking this person and/or reporting abuse
Read next
This Week In Python
Bas Steins -
Comprehensive Weather Data Analysis Using Python: Temperature, Rainfall Trends, and Visualizations
Gichuki Edwin -
How to build a serverless AI chatbot in <100 lines of Python
Qian Li -
Getting Started with Data Analytics Using PyArrow in Python
Alex Merced -
Top comments (97)
Fascinating. I wouldn't be surprised if this kind of research goes mainstream in the future in journalism.
Exactly, I think about the same. Or for having good impact in your products ( social media) if you're trying to sell stuff. There are many possibilities.
A project in mind is to just work in this kind of analysis for suicide prevention. π
Nice, very interesting! He seems to tweet surprisingly a high count of positive tweets (51%). But how much of this tweets are fake news and lies is another question... nytimes.com/interactive/2017/06/23...
Yeah, that resulted surprising to me! I've heard that he's not the only one tweeting from his account, but he has a team for this. That might be a possible reason. That's why it results interesting to analyze the polarity of tweets that come from different sources.
Well technically these sentiment calculations should be taken with a grain of salt. you use VaderSentiment library as well and compare both values of sentiments to get better insight.
Awesome tutorial!!
Thank you so much! πππΌ
he estado intentando correr el script pero tengo varios problemas que quizΓ‘s puedas ayudarme.
Soy nuevo en Python pero me encantarΓa adaptar este ejemplo a otros usuarios si lograse hacer funcionar.
Tengo Python 3.6.3 y trabajo con Spyder, he copiado tu ejemplo pero el script se queda en la lΓΊnea 37:
We create an extractor object:
extractor = twitter_setup()
Cuando aparece este error:
NameError: name 'twitter_setup' is not defined
A que se debe esto?
Gracias por tu orientaciΓ³n!
Es debido a que no has definido tu funciΓ³n
twitter_setup()
.AsegΓΊrate de en Spyder (especΓficamente en tu cΓ³digo) tener definido lo siguiente:
Otra recomendaciΓ³n es que intentes utilizar los Jupyter notebooks. :)
MuchΓsimas gracias Rodolfo, de verdad es que impresionas!!!
Ya resolvΓ este tema... a ver si no me da mΓ‘s cosas y puedo ver lo que sale por mi cuenta!
Abrazos
How did did you write code in your comment with syntax highlighting?
Hola Rodolfo!
Te tengo una pregunta, ΒΏes posible modificar la funciΓ³n de limpieza de Tweets para que no elimine los acentos de las palabras en espaΓ±ol?
Claro, en realidad sΓ³lo serΓa modificar tu regla de limpieza en
re.sub()
. :)Hi Rodolfo, great article!
New to Python, wondering how to retrieve more than the default 15 tweets from this code? I looked up a few solutions elsewhere but couldn't figure out how to integrate. Suggestions?
Thanks again! - Rich
We're actually retrieving the first 200 tweets, this is specified in the
count
parameter:The API allows us to retrieve at most 200 tweets per call.
Just went through the post again and found:
We create a tweet list as follows:
tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)
print("Number of tweets extracted: {}.\n".format(len(tweets)))
We print the most recent 5 tweets:
print("5 recent tweets:\n")
for tweet in tweets[:5]:
print(tweet.text)
print()
Which I've replaced with:
tweets = extractor.search(input("Topic you want to analyze: "))
Perhaps I need to play with this, if I can't figure it out, I'll re-ask. Lol my apologies!
Fixed w/ very simple:
tweets = extractor.search(input("Topic you want to analyze: "), count=200)
Thanks!
Although, even with count=200, this retrieves 100 tweets only.
Is there a way to refresh and retrieve more?
Thanks again!
@deuxperspective You can use a tool I built, hosted on RapidAPI
rapidapi.com/microworlds/api/twitt...
Hi there, I was having some trouble with the "visualizing the statistics" section as detailed in sections 2.1 and 2.2; if you take a look at my GitHub repo, you'll notice I had to comment out #
%matplotlib inline
and replaced requirement withplt.ion()
within the script-running file (trumpet.py) in order to run the scripts without failure (e.g.python3 trumpet.py
). Can you please explain how to generate the visualizations as detailed in those sections? For some reason, I'm unable to render those visual within my Jupyter Notebook-env/config. I'm only 10 days new to Python, so I'd appreciate any guidance. Great tutorial-thanks!
Sure! It's quite easy actually. :)
Instead of adding
plt.ion()
at the beginning, you can add the following code each time you're generating a plot, in order to visualize it:plt.show()
. This will open an external window and display the immediately last plot generated.You can see this in the Official Pyplot tutorial I shared at the end (References).
Please let me know I you have any other problem. :)
Got it, Rodolfo! Thank you for the guidance- tremendous fun! ;)
Would it be possible to check / detect how many likes comes from the staff of a VIP ? It is said that many politicals manage likes and retweets by asking their support to like and retweet their messages? (not sure to be clear) Through 200 tweets, this would be possible to look at the twitter accounts that like systematically and quickly (as soon as published, like bots do) then substract (or minimize) them from the final evaluation.
This is an interesting question.
If you want to count something like this in real time, you would need to modify the way you're consuming the API (rest) and create a listener (you can still do that with Tweepy). That's what I would do, I'd create a specific listener for Trump's tweets and use threads to count for certain time likes and retweets for a new tweet.
Does this answer help? I can try to be more explicit. :)
Yes I understand the idea. This would be a very useful tool to track false popular account.
This might help: github.com/RodolfoFerro/TwitterBot...
You can find more info in the documentation: tweepy.readthedocs.io/en/v3.5.0/st...
Hope this complements my previous answer! ππΌ
Thank you for your tutorial! Its was easy to follow and everything work on my first attempt!
I do not want to reload all the tweets from the web, while I am developing. I altered the first few lines, to cache the tweets locally.
Excellent idea!
What I did at the end (in my personal case) was to save the tweet list as a csv file (
data.to_csv(...)
), taking as an advantage that I already had all the info in a pandas dataframe. :)Thanks for your great comment!
I was looking for a tutorial to recommend to an acquaintance who is moving into digital journalism, and I came across your post. It is very well-written. Thanks for sharing!
This is just a short remark, since you seem to be using Pandas, but not to its fullest potential.
When you observe a possible relationship between RTs and Likes in subsection 2.1, you can quantify this by computing the (Pearson) correlation
data['RTs'].corr(data['Likes'])
(It is close to 0.7.)
When finding the sources of tweets in subsection 2.3, instead of using loops,
sources = data['Source'].unique()
and then, when computing percentages,
data['Source'].value_counts()
You can put the latter in a data frame... In any case, thanks again!
I must say that it was for an introductory workshop and I finished all the material during dawn three days before or something. :P
It might be possible that most of the last part is not optimized in code. :(
Thanks for your observations! :D
They simplify the data handling using the potential of Pandas. :)
for source in data['Source']:
for index in range(len(sources)):
if source == sources[index]:
percent[index] += 1
pass
(
Why did the author write 'pass' on the last line?
Sorry, my bad.
When I was writing the code I created an empty conditional, so at the beginning I put the
pass
reserved word and after that I forgot to take it out.Some comments may only be visible to logged-in visitors. Sign in to view all comments.