My experience did not prepare me for this: Web Scraping! My latest endeavors in coding booting have brought me to something completely different in a language I am still learning...and it was a challenge.
Honestly, web scraping was not the difficult part. Nor was finding the right CSS or what to do with the data. It was setting up my data & methods and then breaking my code so bad I could not get it to work again before I had to submit it. self.ouch
Instead of wallowing in painful memories, I would like to discuss some of the items I used to build my first CLI application. Flatiron Schools introduced my cohort to Nokogiri. I know what you are thinking. No, it is not a tasty snack. The word actually translates to 'saw' as in hacksaw, handsaw, table saw, but not "I saw(past tense) dead people". It is actually a decent web scraper that works with XML and HTML. It was easy to install and setup. Since it is widely used, there is a lot of great documentation on the web around it.
Setup: please start in your project
in terminal
`gem install nokogiri`
back in your editor
(in your GEMFILE)
`gem "nokogiri"`
(in your scraper file)
`require 'nokogiri'`
`require 'open-uri'`
def nameofyourgetpagemethod
Nokogiri::HTML(open(http://somepage.com))
end
Here are some of my favorite links:
- https://nokogiri.org/
- https://readysteadycode.com/howto-parse-html-tables-with-nokogiri
- https://www.freecodecamp.org/news/how-to-scrape-with-ruby-and-nokogiri-and-map-the-data-bd9febb5e18a/
During the building of my CLI, I switched sites often as I felt I was not getting the "right" data I wanted to use. Fortunately, Nokogiri was able to handle any site I threw at it as long as I was able to correctly parse the CSS. I was able to use your average everyday CSS selectors or even table selectors. There was a bit of plug and play as I figured it out. Thank goodness for 'binding.pry'! Was I tempted to say forget it and try getting data from an API instead; however, I was already halfway through.
My biggest challenge and the one that hurt me the most was gemifying my project. The day the project was due (softdue) with moments to spare, I decided to refactor the code a bit to see if I could complete the extra challenge of turning my little thing into a Ruby Gem. Well.....there is a reason we are always told to commit early and commit often. I did not complete the Gem challenge, but stay turned.....it is coming. For now, don't be scared to scrape a site for your own data needs. It honestly is not that bad.
Making a Gem isn't that bad either
Top comments (0)