DEV Community

Artur Chukhrai for SerpApi

Posted on • Updated on • Originally published at serpapi.com

Scraping Apple App Store Product Info And Reviews with Python

What will be scraped

wwbs-apple-app-store-product-and-reviews

Why using API?

There're a couple of reasons that may use API, ours in particular:

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page:

  • Apple Product:

serpapi-status-product

  • Apple Reviews:

serpapi-status-reviews

Head to the Apple Product Page playground and Apple App Store Reviews playground for a live and interactive demo.

Full Code

If you don't need an explanation, have a look at the full code example in the online IDE.

from serpapi import GoogleSearch
import json


def get_product_info(product_id):
    params = {
        'api_key': '...',               # https://serpapi.com/manage-api-key
        'engine': 'apple_product',      # SerpApi search engine 
        'product_id': product_id,       # ID of a product
        'type': 'app',                  # type of Apple Product
        'country': 'us',                # country for the search
    }

    search = GoogleSearch(params)       # data extraction on the SerpApi backend
    product_info = search.get_dict()    # JSON -> Python dict

    del product_info['search_metadata']
    del product_info['search_parameters']
    del product_info['search_information']

    return product_info


def get_product_reviews(product_id):
    params = {
        'api_key': '...',               # https://serpapi.com/manage-api-key
        'engine': 'apple_reviews',      # SerpApi search engine 
        'product_id': product_id,       # ID of a product
        'country': 'us',                # country for the search
        'sort': 'mostrecent',           # sorting reviews
        'page': 1,                      # pagination
    }

    product_reviews = []

    while True:
        search = GoogleSearch(params)
        new_page_results = search.get_dict()

        product_reviews.extend(new_page_results['reviews'])

        if 'next' in new_page_results.get('serpapi_pagination', {}):
            params['page'] += 1
        else:
            break

    return product_reviews


def main():
    product_id = 1507782672

    app_store_results = {
        'product_info': get_product_info(product_id),
        'product_reviews': get_product_reviews(product_id)
    }

    print(json.dumps(app_store_results, indent=2, ensure_ascii=False))


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

Preparation

Install library:

pip install google-search-results
Enter fullscreen mode Exit fullscreen mode

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import json
Enter fullscreen mode Exit fullscreen mode
Library Purpose
GoogleSearch to scrape and parse Google results using SerpApi web scraping library.
json to convert extracted data to a JSON object.

Top-level code environment

At the beginning of the function, the product_id variable is created that stores the ID of the desired product:

product_id = 1507782672
Enter fullscreen mode Exit fullscreen mode

Next, the app_store_results dictionary is created, to which the data returned by the get_product_info(product_id) and get_product_reviews(product_id) functions are added. The explanation of these functions will be in the corresponding headings below.

app_store_results = {
    'product_info': get_product_info(product_id),
    'product_reviews': get_product_reviews(product_id)
}
Enter fullscreen mode Exit fullscreen mode

After the all data is retrieved, it is output in JSON format:

print(json.dumps(app_store_results, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

This code uses the generally accepted rule of using the __name__ == "__main__" construct:

def main():
    product_id = 1507782672

    app_store_results = {
        'product_info': get_product_info(product_id),
        'product_reviews': get_product_reviews(product_id)
    }

    print(json.dumps(app_store_results, indent=2, ensure_ascii=False))


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

This check will only be performed if the user has run this file. If the user imports this file into another, then the check will not work.

You can watch the video Python Tutorial: if name == 'main' for more details.

Get product information

The function takes a specific product_id and returns a dictionary with all the information about that product.

At the beginning of the function, the params dictionary are defined for generating the URL:

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'engine': 'apple_product',      # SerpApi search engine 
    'product_id': product_id,       # ID of a product
    'type': 'app',                  # type of Apple Product
    'country': 'us',                # country for the search
}
Enter fullscreen mode Exit fullscreen mode
Parameters Explanation
api_key Parameter defines the SerpApi private key to use. You can find it under your account -> API key.
engine Set parameter to apple_product to use the Apple Product engine.
product_id Parameter defines the product id you want to search. You can use the specific id of a product that you would like to get the product page of.
type Parameter defines the type of Apple Product to get the product page of. It defaults to app.
country Parameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions.

📌Note: You can also add other API Parameters.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the product_info dictionary we get data from JSON:

search = GoogleSearch(params)       # data extraction on the SerpApi backend
product_info = search.get_dict()    # JSON -> Python dict
Enter fullscreen mode Exit fullscreen mode

The product_info dictionary contains information not only about the product, but also about the request. Request information is not needed, so we remove the corresponding keys using the del statement:

del product_info['search_metadata']
del product_info['search_parameters']
del product_info['search_information']
Enter fullscreen mode Exit fullscreen mode

At the end of the function, the product_info dictionary with the extracted data is returned:

return product_info
Enter fullscreen mode Exit fullscreen mode

The complete function to get product information would look like this:

def get_product_info(product_id):
    params = {
        'api_key': '...',               # https://serpapi.com/manage-api-key
        'engine': 'apple_product',      # SerpApi search engine 
        'product_id': product_id,       # ID of a product
        'type': 'app',                  # type of Apple Product
        'country': 'us',                # country for the search
    }

    search = GoogleSearch(params)       # data extraction on the SerpApi backend
    product_info = search.get_dict()    # JSON -> Python dict

    del product_info['search_metadata']
    del product_info['search_parameters']
    del product_info['search_information']

    return product_info
Enter fullscreen mode Exit fullscreen mode

Get product reviews

The function takes a specific product_id and returns a dictionary with all the reviews about that product.

At the beginning of the function, the params dictionary are defined for generating the URL:

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'engine': 'apple_reviews',      # SerpApi search engine 
    'product_id': product_id,       # ID of a product
    'country': 'us',                # country for the search
    'sort': 'mostrecent',           # sorting reviews
    'page': 1,                      # pagination
}
Enter fullscreen mode Exit fullscreen mode
Parameters Explanation
api_key Parameter defines the SerpApi private key to use. You can find it under your account -> API key.
engine Set parameter to apple_reviews to use the Apple Reviews engine.
product_id Parameter defines the ID of a product you want to get the reviews for.
country Parameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions.
sort Parameter is used for sorting reviews. It can be set to mostrecent or mosthelpful.
page Parameter is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.).

📌Note: You can also add other API Parameters.

Define the product_reviews list to which the retrieved reviews will be added:

product_reviews = []
Enter fullscreen mode Exit fullscreen mode

The while loop is created that is needed to extract reviews from all pages:

while True:
    # data extraction will be here
Enter fullscreen mode Exit fullscreen mode

Then, we create a search object where the data is retrieved from the SerpApi backend. In the new_page_results dictionary we get data from JSON:

search = GoogleSearch(params)
new_page_results = search.get_dict()
Enter fullscreen mode Exit fullscreen mode

Adding new data from this page to the product_reviews list:

product_reviews.extend(new_page_results['reviews'])

# first_review = new_page_results['reviews'][0]
# title = first_review['title']
# text = first_review['text']
# rating = first_review['rating']
# review_date = first_review['review_date']
# author_name = first_review['author']['name']
# author_link = first_review['author']['link']
Enter fullscreen mode Exit fullscreen mode

📌Note: In the comments above, I showed how to extract specific fields. You may have noticed the new_page_results['reviews'][0]. This is the index of a review, which means that we are extracting data from the first review. The new_page_results['reviews'][1] is from the second review and so on.

After data is retrieved from the current page, a check is made to see if the next page exists. If there is one in the serpapi_pagination dictionary, then the page parameter is incremented by 1. Else, the loop stops:

if 'next' in new_page_results.get('serpapi_pagination', {}):
    params['page'] += 1
else:
    break
Enter fullscreen mode Exit fullscreen mode

At the end of the function, the product_reviews dictionary with the extracted data is returned:

return product_reviews
Enter fullscreen mode Exit fullscreen mode

The complete function to get product reviews would look like this:

def get_product_reviews(product_id):
    params = {
        'api_key': '...',               # https://serpapi.com/manage-api-key
        'engine': 'apple_reviews',      # SerpApi search engine 
        'product_id': product_id,       # ID of a product
        'country': 'us',                # country for the search
        'sort': 'mostrecent',           # sorting reviews
        'page': 1,                      # pagination
    }

    product_reviews = []

    while True:
        search = GoogleSearch(params)
        new_page_results = search.get_dict()

        product_reviews.extend(new_page_results['reviews'])

        if 'next' in new_page_results.get('serpapi_pagination', {}):
            params['page'] += 1
        else:
            break

    return product_reviews
Enter fullscreen mode Exit fullscreen mode

Output

{
  "product_info": {
    "title": "Pixea",
    "snippet": "The invisible image viewer",
    "id": "1507782672",
    "age_rating": "4+",
    "developer": {
      "name": "ImageTasks Inc",
      "link": "https://apps.apple.com/us/developer/imagetasks-inc/id450316587"
    },
    "rating": 4.6,
    "rating_count": "620 Ratings",
    "price": "Free",
    "in_app_purchases": "Offers In-App Purchases",
    "logo": "https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
    "mac_screenshots": [
      "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/e0/21/86/e021868d-b43b-0a78-8d4a-e4e0097a1d01/0131f1c2-3227-46bf-8328-7b147d2b1ea2_Pixea-1.jpg/643x0w.webp",
      "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/55/3c/98/553c982d-de30-58b5-3b5a-d6b3b2b6c810/a0424c4d-4346-40e6-8cde-bc79ce690040_Pixea-2.jpg/643x0w.webp",
      "https://is3-ssl.mzstatic.com/image/thumb/Purple123/v4/77/d7/d8/77d7d8c1-4b4c-ba4b-4dde-94bdc59dfb71/6e66509c-5886-45e9-9e96-25154a22fd53_Pixea-3.jpg/643x0w.webp",
      "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource113/v4/44/79/91/447991e0-518f-48b3-bb7e-c7121eb57ba4/79be2791-5b93-4c4d-b4d1-38a3599c2b2d_Pixea-4.jpg/643x0w.webp"
    ],
    "description": "Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them.Supported formats:JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives.Export formats:JPEG, JPEG-2000, PNG, TIFF, BMP.Found a bug? Have a suggestion? Please, send it to support@imagetasks.comFollow us on Twitter @imagetasks!",
    "version_history": [
      {
        "release_version": "2.1",
        "release_notes": "- New \"Fixed Size and Position\" zoom mode- Fixed a bug causing crash when browsing ZIP-files- Bug fixes and improvements",
        "release_date": "2023-01-03"
      },
      ... other versions
    ],
    "ratings_and_reviews": {
      "rating_percentage": {
        "5_star": "76%",
        "4_star": "13%",
        "3_star": "4%",
        "2_star": "2%",
        "1_star": "4%"
      },
      "review_examples": [
        {
          "rating": "5 out of 5",
          "username": "MyrtleBlink182",
          "review_date": "01/18/2022",
          "review_title": "Full-Screen Perfection",
          "review_text": "This photo-viewer is by far the best in the biz. I thoroughly enjoy viewing photos with it. I tried a couple of others out, but this one is exactly what I was looking for. There is no dead space or any extra design baggage when viewing photos. Pixea knocks it out of the park keeping the design minimalistic while ensuring the functionality is through the roof"
        },
        ... other reviews examples
      ]
    },
    "privacy": {
      "description": "The developer, ImageTasks Inc, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy.",
      "privacy_policy_link": "https://www.imagetasks.com/Pixea-policy.txt",
      "cards": [
        {
          "title": "Data Not Collected",
          "description": "The developer does not collect any data from this app."
        }
      ],
      "sidenote": "Privacy practices may vary, for example, based on the features you use or your age. Learn More",
      "learn_more_link": "https://apps.apple.com/story/id1538632801"
    },
    "information": {
      "seller": "ImageTasks Inc",
      "price": "Free",
      "size": "7.1 MB",
      "categories": [
        "Photo & Video"
      ],
      "compatibility": [
        {
          "device": "Mac",
          "requirement": "Requires macOS 10.12 or later."
        }
      ],
      "supported_languages": [
        "English"
      ],
      "age_rating": {
        "rating": "4+"
      },
      "copyright": "Copyright © 2020-2023 ImageTasks Inc. All rights reserved.",
      "in_app_purchases": [
        {
          "name": "Upgrade to Pixea Plus",
          "price": "$3.99"
        }
      ],
      "developer_website": "https://www.imagetasks.com",
      "app_support_link": "https://www.imagetasks.com/pixea",
      "privacy_policy_link": "https://www.imagetasks.com/Pixea-policy.txt"
    },
    "more_by_this_developer": {
      "apps": [
        {
          "logo": "https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
          "link": "https://apps.apple.com/us/app/istatistica/id1126874522",
          "serpapi_link": "https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
          "name": "iStatistica",
          "category": "Utilities"
        },
        ... other apps
      ],
      "result_type": "Full",
      "see_all_link": "https://apps.apple.com/us/app/id1507782672#see-all/developer-other-apps"
    }
  },
  "product_reviews": [
    {
      "position": 1,
      "id": "9446406432",
      "title": "Stop begging for reviews",
      "text": "Stop begging for reviews",
      "rating": 1,
      "review_date": "2022-12-28 21:42:28 UTC",
      "author": {
        "name": "stalfos_knight",
        "link": "https://itunes.apple.com/us/reviews/id41752602"
      }
    },
    ... other reviews
  ]
}
Enter fullscreen mode Exit fullscreen mode

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞

Top comments (0)