Okay, full disclosure: I thought that building a twitter bot was going to be a LOT of work. Turns out it's really easy! First I'll break down how to use the Chatterbot Ruby gem, then move on to how I used Nokogiri.
Step 1:Install and Configure Chatterbot
Here is the chatterbot guide webpage
Firstly, create a new Twitter account! Easy enough.
Secondly, using that twitter account apply for a developer account here.
While you're waiting for approval (should take an hour or two), install the chatterbot ruby gem in your project folder by running gem install chatterbot
in your terminal. Then maybe start on your bot logic (more on this later).
To get the setup script to run, simply make a new Ruby file whose title is the username of the twitter account you'll be using for your bot. Fill it with the following content to start:
require 'rubygems'
require 'chatterbot/dsl'
tweet "Hello World!"
Run your new Ruby file from the terminal and the setup script will run -- it will prompt you for the api keys and secrets that your successful twitter dev account application will provide you. Be sure to copy your confirmation pin from your approval and paste it into the proper step of the setup script.
That's it! Replace "Hello World!"
with whatever logic you like for how you want your bot to tweet! There are other options for searching, retweeting, and replying -- check the chatterbot guide for a detailed rundown of your options.
For my bot, I wanted to scrape a set of webpages for quotes: here's how I did it.
Step 2: Install Nokogiri
Run gem install nokogiri
to get Nokogiri installed in your project folder.
Include the following in the header of your Ruby file:
require 'Nokogiri'
require 'open-uri'
I set an array of URL strings equal to the URLS variable -- here's how I used Nokogiri to scrape a random selection from the URL array:
doc = Nokogiri::HTML(open(URLS[rand(0..URLS.length - 1)]))
Nokogiri::HTML
accesses Nokogiri HTML parsing methods, open
uses open-url to open the webpage, URLS
is my url array, [rand(0..URLS.length - 1)]
is a random index number somewhere in the URLs array.
For my purposes, I wanted all of the p tags on the page that weren't of certain natures -- so I told Nokogiri I wanted them with the .css
command and added in my very lengthy CSS statement:
content = []
doc.css('p:not(.title):not(.toc):not(.index):not(.indentb):not(.quoteb):not(.information):not(.fst):not(.footer):not(.pagenote):not(.quote)').each do |node|
content << node.text
end
I shoveled in the text of each p node that wasn't a title, index, quote etc. into the content array.
Next I use regular expressions to split each paragraph into sentences while preserving punctuation by wrapping the regexp in parentheses.
new_content = []
content.each do |c|
d = c.split(/([\?\!\;\.])/)
d.each do |e|
new_content << e
end
end
Next I deal with setting a random index for this giant array to randomly choose a quote.
#set random index
random = rand(0..new_content.length - 6)
#ensure first element chosen is not punctuation
while new_content[random].length < 4
random = rand(0..new_content.length - 6)
end
On to instantiating the tweet content string and doing some formatting:
#concatenate elements after random chosen index
tweet_content = new_content[random] + new_content[random + 1] + new_content[random + 2] + new_content[random + 3] + new_content[random + 4] + new_content[random + 5]
#formatting
tweet_content = tweet_content.gsub("\r\n", " ")
tweet_content = tweet_content.gsub("\n", " ")
tweet_content = tweet_content.gsub(" ", " ")
tweet_content = tweet_content.gsub("i.e.", "that is")
Next we want to check to make sure the sentences chosen are under the tweet character limits:
#split again to begin checking length of sentences
tweet_content = tweet_content.split(/([\?\!\;\.])/)
#decide on complete sentences that are under the character limit
if tweet_content[3]
if (tweet_content[0].length + tweet_content[1].length + tweet_content[2].length + tweet_content[3].length) < 240
tweet_content = tweet_content[0..3].join
else
tweet_content = tweet_content[0..1].join
end
else
tweet_content = tweet_content[0..1].join
end
Step 3: Celebrate!
tweet_content is now ready to be returned to our tweet function! Instead of tweet "Hello World!"
we can write tweet tweet_content
To automate our bot, we can wrap the contents of our script in a do loop and use the sleep method inside the loop to set the interval of our tweets.
And that's it! A quick setup of a bot and whatever logic you choose for the bot to tweet! If you decide to make a bot and have some interesting logic to share, put it in the comments!
Top comments (0)