DEV Community

Cover image for Deno Web Scrapper
Siddhesh Mangela
Siddhesh Mangela

Posted on

Deno Web Scrapper

You might have created a web scraper with Node.js + request+ cheerio setup or maybe a python one using beautiful soup. This tutorial brings the same to the world of Deno.

In this example, we are scrapping the list of books from

http://books.toscrape.com/

Let's get started, without further ado.

Step 01: app.ts

to start we will create app.ts file and cover the whole code in a try-catch block to take advantage of the first-class await (global async-await).

const url = 'http://books.toscrape.com/';

try {
  console.log(url)
} catch(error) {
  console.log(error);
}
Enter fullscreen mode Exit fullscreen mode

check if code logs the url by running the following command in terminal

deno run app.ts
Enter fullscreen mode Exit fullscreen mode

Step 02: Fetch URL

Deno supports lots of native javascript APIs, Fetch API being one of them which makes request handling easy and dependency-free. Response from fetch is saved in a variable named html.

const url = 'http://books.toscrape.com/';

try {
  const res = await fetch(url);
  const html = await res.text();

  console.log(html)
} catch(error) {
  console.log(error);
}
Enter fullscreen mode Exit fullscreen mode

Deno is secure by default that means to let it access the internet we need to run it with a flag --allow-net

check if code logs the html by running the following command in terminal.

deno run --allow-net app.ts
Enter fullscreen mode Exit fullscreen mode

Step 03: Deno Dom

Deno dom makes it easy to traverse HTML using javascript DOM manipulation methods.

HTML (in text format) that we get with fetch is parsed into a DOMParser object and stored in variable dom. dom variable is traversed to extract page heading from the target site.

import { DOMParser } from 'https://deno.land/x/deno_dom/deno-dom-wasm.ts';

const url = 'http://books.toscrape.com/';

try {
  const res = await fetch(url);
  const html = await res.text();
  const doc: any = new DOMParser().parseFromString(html, 'text/html');

  const pageHeader = doc.querySelector('.header').querySelector('.h1').textContent;

  console.log(pageHeader)
} catch(error) {
  console.log(error);
}
Enter fullscreen mode Exit fullscreen mode

check if code logs “Books to Scrape We love being scraped!” by running the following command in the terminal.

deno run --allow-net app.ts
Enter fullscreen mode Exit fullscreen mode

Bringing it all together

The script picks up the book info by looping over each .product_pod container on the first page and puts it in the books array.

import { DOMParser } from 'https://deno.land/x/deno_dom/deno-dom-wasm.ts';

const url = 'http://books.toscrape.com/';

try {
  const res = await fetch(url);
  const html = await res.text();
  const doc: any = new DOMParser().parseFromString(html, 'text/html');
  const books: any = [];

  const productsPods = doc.querySelectorAll('.product_pod');

  productsPods.forEach((product: any) => {
    const title = product.querySelector('h3').querySelector('a').getAttribute('title');
    const price = product.querySelector('.price_color').textContent;
    const availability = product.querySelector('.availability').textContent.trim();

    books.push({
      title,
      price,
      availability,
    })
  });

  console.log(books);
} catch(error) {
  console.log(error);
}

Enter fullscreen mode Exit fullscreen mode
deno run --allow-net app.ts
Enter fullscreen mode Exit fullscreen mode

will output an array of books with title, price, and availability.


GitHub logo siddacool / deno-web-scraper

An example of a web scraper created with deno

Deno Web Scraper 🦴🕸

An example of a web scraper created with deno

Top comments (5)

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
cacilhas profile image
Montegasppα Cacilhας

Random question: Is it pronounced “dĕnŏ”, “dĕnō”, or “dēnō”?

Collapse
 
siddacool profile image
Siddhesh Mangela

According to Ryan Dahl the creator of deno it is pronounced as "day-no".

in the following youtube video, he pronounces the word "deno" at timestamp 1:26

Collapse
 
cacilhas profile image
Montegasppα Cacilhας

Going to my favourites. Thanks! 😁

Thread Thread
 
siddacool profile image
Siddhesh Mangela

You're welcome 👍