DEV Community

Mikhail Zub for SerpApi

Posted on

Web scraping Apple App Store Product Info And Reviews with Nodejs

What will be scraped

what

Full code

If you don't need an explanation, have a look at the full code example in the online IDE

import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";

const getSearchParams = (searchType) => {
  const isProduct = searchType === "product";
  const reviewsLimit = 10; // hardcoded limit for demonstration purpose
  const engine = isProduct ? "apple_product" : "apple_reviews"; // search engine
  const params = {
    api_key: process.env.API_KEY, //your API key from serpapi.com
    product_id: "1507782672", // Parameter defines the ID of a product you want to get the reviews for
    country: "us", // Parameter defines the country to use for the search
    type: isProduct ? "app" : undefined, // Parameter defines the type of Apple Product to get the product page of
    page: isProduct ? undefined : 1, // Parameter is used to get the items on a specific page
    sort: isProduct ? undefined : "mostrecent", // Parameter is used for sorting reviews
  };
  return { engine, params, reviewsLimit };
};

const getProductInfo = async () => {
  const { engine, params } = getSearchParams("product");
  const json = await getJson(engine, params);
  delete json.search_metadata;
  delete json.search_parameters;
  delete json.search_information;
  return json;
};

const getReviews = async () => {
  const reviews = [];
  const { engine, params, reviewsLimit } = getSearchParams();
  while (true) {
    const json = await getJson(engine, params);
    if (json.reviews) {
      reviews.push(...json.reviews);
      params.page += 1;
    } else break;
    if (reviews.length >= reviewsLimit) break;
  }
  return reviews;
};

const getResults = async () => {
  return { productInfo: await getProductInfo(), reviews: await getReviews() };
};

getResults().then((result) => console.dir(result, { depth: null }));
Enter fullscreen mode Exit fullscreen mode

Why use Apple Product Page Scraper and Apple App Store Reviews Scraper APIs from SerpApi?

Using API generally solves all or most problems that might get encountered while creating own parser or crawler. From webscraping perspective, our API can help to solve the most painful problems:

  • Bypass blocks from supported search engines by solving CAPTCHA or IP blocks.
  • No need to create a parser from scratch and maintain it.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation if there's a need to extract data in large amounts faster.

Head to the Apple Product Page playground and Apple App Store Reviews playground for a live and interactive demo.

Preparation

First, we need to create a Node.js* project and add npm packages serpapi and dotenv.

To do this, in the directory with our project, open the command line and enter:

$ npm init -y
Enter fullscreen mode Exit fullscreen mode

And then:

$ npm i serpapi dotenv
Enter fullscreen mode Exit fullscreen mode

*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.

  • SerpApi package is used to scrape and parse search engine results using SerpApi. Get search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay, and more.

  • dotenv package is a zero-dependency module that loads environment variables from a .env file into process.env.

Next, we need to add a top-level "type" field with a value of "module" in our package.json file to allow using ES6 modules in Node.JS:

ES6Module

For now, we complete the setup Node.JS environment for our project and move to the step-by-step code explanation.

Code explanation

First, we need to import dotenv from dotenv library and call config() method, then import getJson from serpapi library:

import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";
Enter fullscreen mode Exit fullscreen mode
  • config() will read your .env file, parse the contents, assign it to process.env, and return an Object with a parsed key containing the loaded content or an error key if it failed.
  • getJson() allows you to get a JSON response based on search parameters.

Next, we write getSearchParams function, to make the necessary search parameters for two different APIs. In this function, we define and set isProduct constant depending on the searchType argument.

Next, we define and return different search parameters for Product Page API and Reviews API: search engine; how many reviews we want to receive (reviewsLimit constant); search parameters for making a request:

const getSearchParams = (searchType) => {
  const isProduct = searchType === "product";
  const reviewsLimit = 10; // hardcoded limit for demonstration purpose
  const engine = isProduct ? "apple_product" : "apple_reviews"; // search engine
  const params = {
    api_key: process.env.API_KEY, //your API key from serpapi.com
    product_id: "1507782672", // Parameter defines the ID of a product you want to get the reviews for
    country: "us", // Parameter defines the country to use for the search
    type: isProduct ? "app" : undefined, // Parameter defines the type of Apple Product to get the product page of
    page: isProduct ? undefined : 1, // Parameter is used to get the items on a specific page
    sort: isProduct ? undefined : "mostrecent", // Parameter is used for sorting reviews
  };
  return { engine, params, reviewsLimit };
};
Enter fullscreen mode Exit fullscreen mode

When we run this function, we receive different search parameters for:

  • Product Page API:
    product api

  • Reviews API:
    reviews api

You can use the next search params:

Common params:

  • api_key parameter defines the SerpApi private key to use.
  • product_id parameter defines the ID of a product you want to get the reviews for. You can get the ID of a product from our Web scraping Apple App Store Search with Nodejs blog post. You can also get it from the URL of the app. For example product_id of "https://apps.apple.com/us/app/the-great-coffee-app/id534220544", is the long numerical value that comes after "id", 534220544.
  • country parameter defines the country to use for the search. It's a two-letter country code. (e.g., us (default) for the United States, uk for United Kingdom, or fr for France). Head to the Apple Regions for a full list of supported Apple Regions.
  • no_cache parameter will force SerpApi to fetch the App Store Search results even if a cached version is already present. A cache is served only if the query and all parameters are exactly the same. Cache expires after 1h. Cached searches are free, and are not counted towards your searches per month. It can be set to false (default) to allow results from the cache, or true to disallow results from the cache. no_cache and async parameters should not be used together.
  • async parameter defines the way you want to submit your search to SerpApi. It can be set to false (default) to open an HTTP connection and keep it open until you got your search results, or true to just submit your search to SerpApi and retrieve them later. In this case, you'll need to use our Searches Archive API to retrieve your results. async and no_cache parameters should not be used together. async should not be used on accounts with Ludicrous Speed enabled.

Product Page params:

  • type parameter defines the type of Apple Product to get the product page of. It defaults to app.

Reviews params:

  • page parameter is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.).
  • sort parameter is used for sorting reviews. It can be set to: mostrecent (Most recent (default)) or mosthelpful (Most helpful).

Next, we declare the function getProductInfo that gets all product information from the page and returns it. In this function we receive and destructure engine and params from getSearchParams function with "product" argument. Next, we get json with results, delete unnecessary keys, and return it:

const getProductInfo = async () => {
  const { engine, params } = getSearchParams("product");
  const json = await getJson(engine, params);
  delete json.search_metadata;
  delete json.search_parameters;
  delete json.search_information;
  return json;
};
Enter fullscreen mode Exit fullscreen mode

Next, we declare the function getReviews that gets reviews results from all pages (using pagination) and return it:

const getReviews = async () => {
  ...
};
Enter fullscreen mode Exit fullscreen mode

In this function we need to declare an empty reviews array, receive and destructure engine, params and reviewsLimit from getSearchParams function without arguments, then and using while loop get json with results, add reviews from each page and set next page index (to params.page value).

If there are no more results on the page or if the number of received results is more than reviewsLimit we stop the loop (using break) and return an array with results:

const reviews = [];
const { engine, params, reviewsLimit } = getSearchParams();
while (true) {
  const json = await getJson(engine, params);
  if (json.reviews) {
    reviews.push(...json.reviews);
    params.page += 1;
  } else break;
  if (reviews.length >= reviewsLimit) break;
}
return reviews;
Enter fullscreen mode Exit fullscreen mode

And finally, we declare and run the getResults function, in which we make an object with results from getProductInfo and getReviews functions. Then we print all the received information in the console with the console.dir method, which allows you to use an object with the necessary parameters to change default output options:

const getResults = async () => {
  return { productInfo: await getProductInfo(), reviews: await getReviews() };
};

getResults().then((result) => console.dir(result, { depth: null }));
Enter fullscreen mode Exit fullscreen mode

Output

{
   "productInfo":{
      "title":"Pixea",
      "snippet":"The invisible image viewer",
      "id":"1507782672",
      "age_rating":"4+",
      "developer":{
         "name":"ImageTasks Inc",
         "link":"https://apps.apple.com/us/developer/imagetasks-inc/id450316587"
      },
      "rating":4.6,
      "rating_count":"594 Ratings",
      "price":"Free",
      "logo":"https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
      "mac_screenshots":[
         "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/b1/8c/fb/b18cfb80-cb5c-d67d-2edc-ee1f6666e012/35b8d5a7-b493-4a80-bdbd-3e9d564601dd_Pixea-1.jpg/643x0w.webp",
         "https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/96/08/83/9608834d-3d2b-5c0b-570c-f022407ff5cc/1836573e-1b6a-421c-b654-6ae2f915d755_Pixea-2.jpg/643x0w.webp",
         "https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/58/fd/db/58fddb5d-9480-2536-8679-92d6b067d285/98e22b63-1575-4ee6-b08d-343b9e0474ea_Pixea-3.jpg/643x0w.webp",
         "https://is2-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/c3/f3/f3/c3f3f3b5-deb0-4b58-4afc-79073373b7b9/28f51f38-bc59-4a61-a5a1-bff553838267_Pixea-4.jpg/643x0w.webp"
      ],
      "description":"Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them.Supported formats:JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives.Export formats:JPEG, JPEG-2000, PNG, TIFF, BMP.Found a bug? Have a suggestion? Please, send it to support@imagetasks.comFollow us on Twitter @imagetasks!",
      "version_history":[
         {
            "release_version":"1.4",
            "release_notes":"- New icon- macOS Big Sur support- Universal Binary- Bug fixes and improvements",
            "release_date":"2020-11-09"
         },
        ... and other versions
      ],
      "ratings_and_reviews":{
         "rating_percentage":{
            "5_star":"76%",
            "4_star":"14%",
            "3_star":"4%",
            "2_star":"2%",
            "1_star":"3%"
         },
         "review_examples":[
            {
               "rating":"5 out of 5",
               "username":"MyrtleBlink182",
               "review_date":"01/18/2022",
               "review_title":"Full-Screen Perfection",
               "review_text":"This photo-viewer is by far the best in the biz. I thoroughly enjoy viewing photos with it. I tried a couple of others out, but this one is exactly what I was looking for. There is no dead space or any extra design baggage when viewing photos. Pixea knocks it out of the park keeping the design minimalistic while ensuring the functionality is through the roof"
            },
            ... and other reviews examples
         ]
      },
      "privacy":{
         "description":"The developer, ImageTasks Inc, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy.",
         "privacy_policy_link":"https://www.imagetasks.com/Pixea-policy.txt",
         "cards":[
            {
               "title":"Data Not Collected",
               "description":"The developer does not collect any data from this app."
            }
         ],
         "sidenote":"Privacy practices may vary, for example, based on the features you use or your age. Learn More",
         "learn_more_link":"https://apps.apple.com/story/id1538632801"
      },
      "information":{
         "seller":"ImageTasks Inc",
         "price":"Free",
         "size":"5.8 MB",
         "categories":[
            "Photo & Video"
         ],
         "compatibility":[
            {
               "device":"Mac",
               "requirement":"Requires macOS 10.12 or later."
            }
         ],
         "supported_languages":[
            "English"
         ],
         "age_rating":{
            "rating":"4+"
         },
         "copyright":"Copyright © 2020 Andrey Tsarkov. All rights reserved.",
         "developer_website":"https://www.imagetasks.com",
         "app_support_link":"https://www.imagetasks.com/pixea",
         "privacy_policy_link":"https://www.imagetasks.com/Pixea-policy.txt"
      },
      "more_by_this_developer":{
         "apps":[
            {
               "logo":"https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
               "link":"https://apps.apple.com/us/app/istatistica/id1126874522",
               "serpapi_link":"https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
               "name":"iStatistica",
               "category":"Utilities"
            },
            ... and other apps
         ],
         "result_type":"Full",
         "see_all_link":"https://apps.apple.com/us/app/id1507782672#see-all/developer-other-apps"
      }
   },
   "reviews":[
      {
         "position":1,
         "id":"9332275235",
         "title":"Doesn't respect aspect ratios",
         "text":"Seemingly no way to maintain the aspect ratio of an image. It always wants to fill the photo to the window size, no matter what sizing options you pick. How useless is that?",
         "rating":3,
         "review_date":"2022-11-26 13:29:43 UTC",
         "author":{
            "name":"soren121",
            "link":"https://itunes.apple.com/us/reviews/id33706024"
         }
      },
      ... and other reviews
   ]
}
Enter fullscreen mode Exit fullscreen mode

If you want other functionality added to this blog post or if you want to see some projects made with SerpApi, write me a message.


Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞

Top comments (0)