DEV Community

Cover image for How To Scrape Amazon Product Data Without Coding
Antonello Zanini for Writech

Posted on • Originally published at writech.run

How To Scrape Amazon Product Data Without Coding

Web scraping allows you to retrieve any publicly available data from the web. But each web page has a unique layout and stores different data. So, programmatically extracting data from a web page involves custom logic.

As you can imagine, building such scripts costs you time and money. Fortunately, several scraping services have been recently developed, and they enable you to scrape the web without writing a single line of code. This also means that anyone can use them, even non-technical users!

Here, you will learn how to extract data from Amazon product data with Octoparse, a no-code, easy-to-use, fully-featured scraping service. Let's learn everything you need to know about Amazon product scraping!

What Data To Scrape From Amazon

An Amazon product consists of several pieces of data, but the most important ones are:

  • Product name

  • Price

  • Discount (if present)

  • Product description

  • List of features associated with the product (if present)

  • Rating

  • Product images

All this info is what you should focus on while scraping an Amazon product. On the other hand, you may want to retrieve different data based on your goals.

Now, let's understand why you need an advanced tool to scrape data from Amazon.

Why You Need an Advanced Tool to Scrape Amazon

Amazon has implemented several ways to avoid web scraping. The most relevant challenges associated with scraping Amazon are:

  • Your IP could and probably will be banned

  • Each Amazon product page can have a custom layout

  • Each product can have different data

  • The Amazon product page changes quickly

So, writing a scraping script to allow all these challenges will cost you a lot of time, money, and effort. This is why you should consider adopting an advanced scraping tool that can deal with all these issues natively.

Now, let's delve into Octoparse, the tool chosen to scrape Amazon product data.

What Is Octoparse?

"Octoparse is an extremely powerful data extraction tool that has optimized and pushed our data scraping efforts to the next level" Octoparse's official website

https://www.youtube.com/embed/Y2ArkGbigUE

Octoparse is an advanced website crawler that empowers you to extract any kind of data you need from the web. It comes with several features, including auto-detection, task templates, advanced modes, pagination and infinite scrolling handling, data format changer, and more.

Keep in mind that Octoparse is based on an easy-to-use, intuitive, point-and-click interface designed to guide you throughout the data extraction process. So, no code is involved. But you can also use the Octoparse API program to run scraping tasks programmatically.

Octoparse also provides you with a scheduled cloud extraction feature to extract dynamic data in real-time via the cloud. Then, it comes with an API program, which I will show you how to use shortly. Furthermore, the tool reproduces human activity when exploring web pages to avoid being detected while scraping. But if this happened, it offers IP proxy servers and user agent string rotation.

So, it gives everything you need to avoid Amazon's anti-scraping policies!

Scraping Amazon Product Data with Octoparse

Follow this step-by-step tutorial to learn how to scrape Amazon product data with Octoparse.

1. Getting started with Octoparse

First, you need an Octoparse account. Also, you have to install Octoparse.

Download Octoparse 8.x from here:

https://www.octoparse.com/download
Enter fullscreen mode Exit fullscreen mode

Then, follow these steps:

  1. Run the "Octoparse Setup X.Y.Z_"_ file (e.g. "Octoparse Setup 8.5.2")

  2. Follow the installation instructions

  3. Log in with your Octoparse account, or sign up here if you do not have yet an account.

Note that signing up is free. But if you want to get full access to all Octoparse features, a Standard Plan is required. Learn more about the plans offered by Octoparse here.

If you are considering adopting Octoparse for your business, the Octoparse Summer Sale 2022 is waiting for you. Starting from June 15, you will have the opportunity to subscribe to Octoparse with large discounts! Take advantage of this!

Now, you have everything required to start employing the power of Octoparse.

2. Identify the Amazon product to scrape

It is time to choose the Amazon product you want to scrape. In this tutorial, you will see how to scrape the product data of a 2020 Apple MacBook Air Laptop.

This is what the Amazon product link looks like:

https://www.amazon.com/Apple-MacBook-13-inch-256GB-Storage/dp/B08N5KWB9H/ref=sr_1_3?keywords=macbook%2Bair&qid=1652428198&sr=8-3&th=1
Enter fullscreen mode Exit fullscreen mode

Keep it at hand because you will need it in the next step.

3. Scraping Amazon product data with Octoparse in just a few clicks

Launch Octoparse, past the Amazon product link retrieved before in the URL bar, and press "Start".

This is what you should be seeing now:

Now, click on the page elements you want to retrieve and then select the "Extract the text of the element" option from the Tips panel. This way, you can start collecting data from the Amazon product page.

If you want to retrieve also the product images, click on the page element containing the image, click on ">" in the Tips panel, and select "IMG".

This way, you just told Octopare to consider the img HTML element. Now, click "Extract the URL of the selected image" to retrieve the image URL.

You can follow this approach to retrieve each product image.

Keep selecting the elements you want to scrape and retrieve all the Amazon product data you need. Also, remember to rename the data fields to make them easier to understand, as shown below:

Now, keep in mind that scraped data could contain unwanted characters or might not be in the desired format. Luckily, Octoparse allows you to clean the data and transform it into the format you want. Let's see how.

Consider the price data. This is what it originally looks like:

$949
.
99
Enter fullscreen mode Exit fullscreen mode

The two newlines between the "." character should be removed. To achieve this, click the Price data field, then "…", and select "Clean data".

Now, click on "+ Add Step", and select the "Replace with Regular Expression" option.

Define a regular expression as follows:

Click on "Confirm," then on "Apply", and your "Price" data should now look like this:

$949.99
Enter fullscreen mode Exit fullscreen mode

Clean all your data, and when you are ready, save your Octoparse task. Then, click on the "Run" in the upper right corner. Octoparse will ask you if you want to run the task locally or in the cloud.

In this case, a local run would be enough.

Wait for the task execution to finish:

Then, click on "Export Data" and select the data export format in the popup below:

This is an example of an output JSON you can get from Octoparse:

[
   {
      "Name":"2020 Apple MacBook Air Laptop: Apple M1 Chip, 13'' Retina Display, 8GB RAM, 256GB SSD Storage, Backlit Keyboard, FaceTime HD Camera, Touch ID. Works with iPhone/iPad; Gold",
      "Price":"$949.99",
      "Discount":"5%",
      "Description":"All-Day Battery Life – Go longer than ever with up to 18 hours of battery life.\nPowerful Performance – Take on everything from professional-quality editing to action-packed gaming with ease. The Apple M1 chip with an 8-core CPU delivers up to 3.5x faster performance than the previous generation while using way less power.\nSuperfast Memory – 8GB of unified memory makes your entire system speedy and responsive. That way it can support tasks like memory-hogging multitab browsing and opening a huge graphic file quickly and easily.\nStunning Display – With a 13.3'' Retina display, images come alive with new levels of realism. Text is sharp and clear, and colors are more vibrant.\nWhy Mac – Easy to learn. Easy to set up. Astoundingly powerful. Intuitive. Packed with apps to use right out of the box. Mac is designed to let you work, play, and create like never before.\nSimply Compatible – All your existing apps work, including Adobe Creative Cloud, Microsoft 365, and Google Drive. Plus you can use your favorite iPhone and iPad apps directly on macOS. Altogether you'll have access to the biggest collection of apps ever for Mac. All available on the App Store.\nEasy to Learn – If you already have an iPhone, MacBook Air feels familiar from the moment you turn it on. And it works perfectly with all your Apple devices. Use your iPad to extend the workspace of your Mac, answer texts and phone calls directly on your Mac, and more.",
      "Features":"Display: 13.3-inch (diagonal) LED-backlit display with IPS technology; 2560-by-1600 native resolution at 227 pixels per inch with support for millions of colors\nProcessor: System on Chip (SoC) Apple M1 chip; 8-core CPU with 4 performance cores and 4 efficiency cores; 16-core Neural Engine\nGraphics and Video Support: Up to Apple 8-core GPU\nCharging and Expansion:Two Thunderbolt / USB 4 ports with support for: Charging, DisplayPort, Thunderbolt 3 (up to 40 Gbps), USB 3.1 Gen 2 (up to 10 Gbps)\nWireless: 802.11ax Wi-Fi 6 wireless networking; IEEE 802.11a/b/g/n/ac compatible. Bluetooth 5.0 wireless technology\nIn the Box: 13-inch MacBook Air, 30W USB-C Power Adapter, USB-C Charge Cable (2 m)\nHeight: 0.16–0.63 inch (0.41–1.61 cm)\nWidth: 11.97 inches (30.41 cm)\nDepth: 8.36 inches (21.24 cm)\nWeight: 2.8 pounds (1.25 kg)\nRelease Date: 11/10/2020",
      "Reviews":"4.8 out of 5",
      "Ratings":"14,575 global ratings",
      "Image_1":"https://m.media-amazon.com/images/I/71vFKBpKakL._AC_SX385_.jpg",
      "Image_2":"https://m.media-amazon.com/images/I/81HZAfCGZ5L._AC_SX466_.jpg"
   }
]
Enter fullscreen mode Exit fullscreen mode

As you can see, it contains all the Amazon product data that was selected in Octoparse and in a human-readable format!

Et voilà, you just scraped the data from an Amazon product without a single line of code.

Conclusion

In this article, you learned what data you should scrape from the Amazon product page, why, what challenges you will have to face, and how to do it with Octoparse. This is a powerful data extraction tool that allows you to create a scraping task through its point-and-click interface and without a single line of code. Also, Octoparse is so advanced that it natively equips you with features to avoid Amazon's anti-scraping policies.

Thanks for reading! I hope that you found this article helpful.


The post "How To Scrape Amazon Product Data Without Coding" appeared first on Writech.

Top comments (0)