Pratik Parmar

Posted on Feb 21, 2023

No Code Scraping: Using No Code tools to Scrape an eCommerce site and send an alert text using Zyte API, n8n, and Telegram

#tutorial #api #scraping #nocode

As an audiophile, I am always on the lookout for deals on headphones, speakers and quality AV. Headphone Zone is my go-to place for such purchases, as they regularly host clearance sales and I love a deal too.🤑
However, I often miss out on these sales because the emails announcing them get lost in the flood of promotional emails and spam I receive, and I don’t want to visit these sites every day. So I’d much rather have an alert sent to my telegram account when there is a sale.

So, being a geek I am I decided to see if I could use some simple no-code tools (because why not) and a web scraping API (Zyte API) to create a system that sends me a notification whenever there is a clearance sale on Headphone Zone.

This way, I can be sure not to miss out on any great deals on audio devices.
The good news is it was surprisingly easy, so I’m going to show you do it too. Sounds fun, right? Let's get started. 🚀

How it's going to work:

What you’re going to do is:

Setup a no-code tool (n8n) to manage the various tools and tasks needed
Setup a web scraping API and scrape the site to monitor the site every day
Detect if there is a sale on
Send an alert to my phone via telegram

1. Introduction to n8n and Zyte API

n8n is a no-code workflow automation tool. It allows you to automate data-driven processes and connect your apps into a single workflow. n8n provides a drag-and-drop user interface that enables you to build workflows without writing any code.

Zyte API is an API that aims to solve all web data extraction needs. It comes with a built-in, transparent anti-ban solution with IP rotation, browser emulation, and website-specific fine-tuning. In a nutshell, Zyte API will ensure you don't get ban-hammered and delivers your data without any hiccup.

2. Prerequisites: Setting up n8n and acquire Zyte API Key

2.1 n8n:

Easiest way to get started with n8n is the desktop app.Check out the quickstart guide for more information.

2.2 Zyte API Key:

For Zyte API, you just need to sign up at https://app.zyte.com/account/signup/zyteapi and fetch the Zyte API key. For step by step guide, you can refer to this guide. Keep this API key handy, we will be using this API later in the workflow.

2.3 Telegram Bot Credentials (optional):

On Telegram, chat with BotFather and create a new bot. BotFather will provide a bot token, which can be used to integrate the bot into any platform.

One last thing we’ll need is a Telegram Chat ID.
Check out this guide to learn how to get a chat ID.

You can skip this step, if you don’t want to use the Telegram node.

3. How web scraping workflow works in n8n

Pass the website URL you want to scrape in the cURL. Here it’s https://www.headphonezone.in/collections/clearance.
Using the HTTP request node, make a Zyte API call to fetch the HTML content
Use the HTML Extract node to extract data from the HTML content
Clean the data and send it over using the Telegram node.

4. Configuring the workflow

To create a new n8n workflow, just head over to the workflow and click on new to create a new workflow.

The n8n workflow is made up of small executable blocks known as nodes.

You can install any nodes by clicking on the "+" button in the top right corner of the n8n dashboard and selecting from the list of available nodes.

4.1 Getting the website data

In order to download the HTML data from the website we will need an HTTP Request node. After adding it to your workflow, you will need to specify the website URL you want to scrape. You can also specify any other request parameters, such as the method, headers, and body, as needed.

Alternatively, you can use cURL to configure this node. Click on Import cURL. Paste the following cURL request and import it. Make sure that you’ve updated your Zyte API key here.

curl \
   --user YOUR_ZYTE_API_KEY_HERE: \
   --header 'Content-Type: application/json' \
   --data '{"url": "https://www.headphonezone.in/collections/clearance", "browserHtml": true}' \
   https://api.zyte.com/v1/extract

In the node configuration, go to options and add a Response option. Set the response format as Text and set Put Output in Field as response.

Click on execute to ensure everything is working properly.

4.2 Fetch HTML Content using a Set Node

Now if you noticed, the response from the HTTP Request node is in the JSON. For our workflow, we only need the browserHtml though, which is the HTML content of the webpage. We can use the Set node to create another field ‘data’ and assign browserHtml to that field.

Keep Only Set: Enable
Values to Set:
String
Name: data
Value: {{$json["response"]["browserHtml"]}}

This node will fetch the browserHtml field from the response field which is a JSON field, and set it to the data field of string data type.

4.3 Extracting Product Data from the HTML Data

In the HTML Extract node, you can use CSS selectors to extract the data you want from the HTML response received by the HTTP Request node. You can specify the element or attribute you want to extract by entering a CSS selector in the Selector field.

Node: HTML
Source Data: JSON
JSON Property: data

We’ve stored the browserHtml in the data variable, in the previous Set node.

Extraction Values

Key: products
CSS Selectors: .product-item

.product-item is the main element under which all product details are stored.

Return Value: HTML
Return Array: Enable

Store the response as HTML in the form of an array.
Post execution, the output should look like this.

Now we can see, we’ve received the HTML data of all products. But what we want is human-readable information, that too individually.

So let’s separate it out first using the Item Lists node.

Node: Item List
Operations: Split Out Items
Field To Split Out: products
Include: No Other Fields

4.4 Extracting Individual Products

Alrighty, now that we have got a products field, which contains data of all products in the HTML format. All we need to do is extract the product information using the HTML Extract Node.

Node: HTML Extract Node
Source Data: JSON
JSON Property: products

Extraction Values:

Extract Product Name
Key: name
CSS Selector: .product-item-meta__title
Return Value: Text
Extract Product URL
Key: url
CSS Selector: .product-item-meta__title
Return Value: Attribute
Attribute: href
Extract Product Price
Key: price
CSS Selector: .price--highlight
Return Value: Text

After executing the node, it should display the individual product details.

4.5 Send Message on Telegram

Phew, that was fun! But hey, there’s more! Time to add a telegram Node to send messages to the Telegram bot we created earlier.

First of all, add credentials for Telegram API and provide a bot token.

Node: Telegram
Resource: Message
Operation: Send Message
Chat ID: Provide the Telegram chat ID we received earlier.
Text:

{{ $json["name"] }}
https://www.headphonezone.in/{{ $json["url"] }}
{{ $json["price"] }}

This text will display a message in this format on Telegram:

TIN HiFi - T5 
https://www.headphonezone.in//products/tin-hifi-t5 
Sale price₹ 9,999

And voila! If everything worked well, you should be able to receive messages on the Telegram bot.

4.6 Automate the workflow

Currently, we still need to execute the workflow manually.

What’s the point of having an automation workflow if it needs a manual trigger? Fortunately, n8n also has a cron node. Let’s add that node to the workflow and remove the start node.

Here I want to execute this workflow first day of every month at midnight, so this is how the configuration looks like:

Node: Cron

Trigger Times:

Mode: Every Month
Hour: 0
Minute: 0
Day of the Month:1

4.6 Activate the workflow

Finally, we’re ready to deploy our workflow. Before that, make sure that you followed all steps of the tutorial, your workflow should look like this. You can activate the workflow from the top right corner of the n8n app and done! Your workflow is now active and will notify you when there’s any offer available.

5. What’s Next?

Our workflow is still hosted on our desktop / local machine. Hence, if your computer is off during the trigger time, then the workflow won’t work and you will miss out on the amazing deals.

You can check out the n8n cloud to deploy your workflow on the cloud.

Also, our workflow only scrapes data from the first page of the clearance section. It also needs pagination logic.

What’s the solution then?

You can try the Zyte Auto Extract API, which takes care of the scraping logic, pagination, and every painful aspect of the scraping.

Let us know in the comments, if you want us to create another tutorial on that.

Till then, happy scraping! This is me, Pratik Parmar signing off.
Over and out!

DEV Community