Scraping with Ruby p.1

#webdev #ruby #beginners #tutorial

Hi,

This time I will be walking through how to scrape a website using Ruby. There are plenty of guides out there for this, and I used a lot of them to freshen up my memory and try new things.

Goal: To get a better understanding of scraping, getting the correct data and how to use Nokogiri + Watir together.

I will be splitting this up in parts as I go. Future parts will include fetching multiple pages, parsing to CSV and more.

Why use Nokogiri and Watir?

Nokogiri is a gem that makes it easy to parse and search HTML and XML documents.
Watir is a gem that allows us to interact with a website using HTTP requests.

Together, they are a good match and makes it very straight forward and easy.

Getting started

First, you'll need to install the Nokogiri and Watir gems. You can do this by running the following command in your terminal:

gem install nokogiri watir

Adding them to your project.

require 'nokogiri'
require 'watir'

response = Watir.get('https://www.example.com')
doc = Nokogiri::HTML(response.body)

In the above example, we use Watir to make a GET request to the website "https://www.example.com" and store the response in the "response" variable. We then use Nokogiri to parse the HTML from the response's body and store it in the "doc" variable.

Searching and Extracting Data

Now that we have the HTML from the website in the "doc" variable, we can use Nokogiri to search and extract information from it. Here's an example of how to search for all the h1 tags on the website:

h1_tags = doc.search('h1')

h1_tags.each do |h1|
  puts h1.text
end

Above, we use Nokogiri's search method to find all the h1 tags, and then iterate over each h1 tag and print its text. Easy peasy, right?

But the search method is only one among many methods you can use. Another one that is really convenient is the CSS selector.

content_p = doc.css('.content p')

puts content_p.text

This will search the HTML in doc for a paragraph with the class ".content", and then prints the text inside that paragraph.

And that's pretty much it! ✌🏼

Now, this is just the tip of the iceberg of what you can do with this.

In the next part, I will go more into things such:

Fetching all elements and adding logic
Passing data fetched to a file such as CSV
Error handling
Adding headers to your request
and more.

DEV Community

Scraping with Ruby p.1

Why use Nokogiri and Watir?

Getting started

Adding them to your project.

Searching and Extracting Data

Top comments (0)

Read next

Toggle Switch Realistic illusion using the core html and core Css Code

React.js Explained: A Comprehensive Guide to Building Modern Web Applications

Integrating Google Calendar API in Node.JS: A Guide to Event Creation and Meeting Scheduling

JavaScript Best Practices