DEV Community

Cover image for Scrape Play Store App Data
Serpdog
Serpdog

Posted on • Originally published at serpdog.io

Scrape Play Store App Data

Introduction

In this tutorial, we are going to scrape Google Play Store App Data using Node JS. We will cover some basic information like app rating and reviews, sample images, description, etc.

Scrape Google Play Store App Data 1

Requirements:

Web Parsing with CSS selectors

Searching the tags from the HTML files is not only a difficult thing to do but also a time-consuming process. It is better to use the CSS Selectors Gadget for selecting the perfect tags to make your web scraping journey easier.

This gadget can help you to come up with the perfect CSS selector for your need. Here is the link to the tutorial, which will teach you to use this gadget for selecting the best CSS selectors according to your needs.

User Agents

User-Agent is used to identify the application, operating system, vendor, and version of the requesting user agent, which can save help in making a fake visit to Google by acting as a real user.

You can also rotate User Agents, read more about this in this article: How to fake and rotate User Agents using Python 3.

If you want to further safeguard your IP from being blocked by Google, you can try these 10 Tips to avoid getting Blocked while Scraping Google.

Install Libraries

To start scraping Google Play Store App Data we need to install some NPM libraries, so that we can move forward.

  1. Unirest
  2. Cheerio

So before starting, we have to ensure that we have set up our Node JS project and installed both the packages - Unirest JS and Cheerio JS. You can install both packages from the above link.

Target:

Scrape Google Play Store App Data 2

Process:

Let's begin the process of scraping the Play Store App Data. We will be using Unirest JS to extract the raw HTML data and parse this data with the help of Cheerio JS.

Open the below link in your browser, so we can start selecting the HTML tags for the required elements.

https://play.google.com/store/apps/details?id=com.whatsapp

Let us make a GET request using Unirest JS on the target URL.

    const unirest = require("unirest");
    const cheerio = require("cheerio");

    const getGooglePlayData = async() => {

    let url = "https://play.google.com/store/apps/details?id=com.whatsapp"

    let response = await unirest
    .get(url)
    .headers({
        "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
        })
    const $ = cheerio.load(response.body)
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation:

  1. In the first and second lines, we declared the constant for the Unirest and Cheerio libraries.
  2. In the next line, we declared a function to get the Google Play Data.
  3. After that, we declared a constant for the URL and a head object which consist of the User Agent.
  4. Next, we made the request on the URL with the help of Unirest.
  5. In the last line, we declared a cheerio instance variable to load the response.

Now, we will prepare our parser by searching the tags with the help CSS selector gadget, stated above in the Requirements section.

Scrape Google Play Store App Data 3

In the above image, the tag for the title is xwcR9d. So, its parser would look like this:

    let app_info = {};

    app_info.title = $(".xwcR9d").text(); 
Enter fullscreen mode Exit fullscreen mode

In the above code, we have declared an object app_info for storing the basic information about app. And then extracted the title from the HTML with the help of cheerio constant.

Let us now scrape the the user reviews also.

 Scrape Google Play Store App Data 4

These reviews are under the container with the tag EGFGHd here. So, after parsing the reviews our code looks like this:

    app_info.user_reviews = [];

    $(".EGFGHd").each((i,el) => {
        app_info.user_reviews.push({
            name: $(el).find(".X5PpBb").text(),
            date: $(el).find(".bp9Aid").text(),
            description: $(el).find(".h3YV2d").text(),
            thumbnail: $(el).find("img").attr("src")
        })
    })
Enter fullscreen mode Exit fullscreen mode

First, we declared an array user_reviews inside our app_info object. And then we loop over the selected container to scrape the required data.

We scraped the name, review date, description, and the user thumbnail with the help of above code.

Similarly, we can scrape the other parts on the page by selecting the tags with the help of selector gadget. After, completing the selection process our parser will look like this:

    let app_info = {};

    app_info.title = $(".xwcR9d").text();
    app_info.company = $(".auoIOc").text();
    app_info.app_thumbnail = $(".arM4bb").attr("src");
    app_info.rating = parseFloat($(".jILTFe").text().replace("star", ""));
    app_info.reviews = $(".EHUI5b").text().split(" ")[0];
    app_info.downloads = $(".wVqUob:nth-child(2) .ClM7O").text();
    app_info.rated_for = $(".wVqUob~ .wVqUob+ .wVqUob .g1rdde").text().replace("Rated for ", "").replace("info", "")
    app_info.description = $(".bARER").text();
    app_info.developer_info = {};
    app_info.developer_info.website = $(".VVmwY:nth-child(1) .pSEeg").text();
    app_info.developer_info.email = $(".VVmwY:nth-child(2) .pSEeg").text();
    app_info.developer_info.address = $(".VVmwY:nth-child(3) .pSEeg").text();
    app_info.developer_info.privacy_policy = $(".VVmwY:nth-child(4) .pSEeg").text();

    app_info.user_reviews = [];

    $(".EGFGHd").each((i,el) => {
        app_info.user_reviews.push({
            name: $(el).find(".X5PpBb").text(),
            date: $(el).find(".bp9Aid").text(),
            description: $(el).find(".h3YV2d").text(),
            thumbnail: $(el).find("img").attr("src")
        })
    })

    app_info.images_results = [];

    $(".aoJE7e .Atcj9b").each((i,el) => {
        app_info.images_results.push({
            src: $(el).find("img").attr("src")
        })
    })
Enter fullscreen mode Exit fullscreen mode

Now, our results should look like this:

    {
        title: 'WhatsApp Messenger',
        company: 'WhatsApp LLC',
        app_thumbnail: 'https://play-lh.googleusercontent.com/bYtqbOcTYOlgc6gqZ2rwb8lptHuwlNE75zYJu6Bn076-hTmvd96HH-6v7S0YUAAJXoJN=w240-h480-rw',
        rating: 4.1,
        reviews: '172M',
        downloads: '5B+',
        rated_for: '3+',
        description: 'WhatsApp from Meta is a FREE messaging and video calling app. It’s used by over 2B people in more than 180 countries. It’s simple, reliable, and private, so you can easily keep in touch with your friends and family. WhatsApp works across mobile and desktop even on slow connections, with no subscription fees*.Private messaging across the worldYour personal messages and calls to friends and family are end-to-end encrypted. No one outside of your chats, not even WhatsApp, can read or listen to them.Simple and secure connections, right awayAll you need is your phone number, no user names or logins. You can quickly view your contacts who are on WhatsApp and start messaging.High quality voice and video callsMake secure video and voice calls with up to 8 people for free*. Your calls work across mobile devices using your phone’s Internet service, even on slow connections.Group chats to keep you in contactStay in touch with your friends and family. End-to-end encrypted group chats let you share messages, photos, videos and documents across mobile and desktop.Stay connected in real timeShare your location with only those in your individual or group chat, and stop sharing at any time. Or record a voice message to connect quickly.Share daily moments through StatusStatus allows you to share text, photos, video and GIF updates that disappear after 24 hours. You can choose to share status posts with all your contacts or just selected ones.*Data charges may apply. Contact your provider for details.---------------------------------------------------------If you have any feedback or questions, please go to WhatsApp > Settings > Help > Contact Us',
        developer_info: {
            website: 'http://www.whatsapp.com/',
            email: 'android@support.whatsapp.com',
            address: '1601 Willow Road\nMenlo Park, CA 94025',
            privacy_policy: 'http://www.whatsapp.com/legal/#Privacy'
        },
        user_reviews: [
            {
            name: 'Vedant Jain',
            date: 'December 13, 2022',
            description: 'After the recent update I am not able to view the photos which are set to view only once. In addition to that I am not able to backup my chats to Google drive. It always stops at 96%, even though there is enough space on my drive. The customer support just sends automated messages and is of no help. Really disappointed with the customer support',
            thumbnail: 'https://play-lh.googleusercontent.com/a/AEdFTp5OsFV7faP9ETEAvpvI_GxXgW-7bodH2UIukBqM=s32-rw-mo'
            },
            {
            name: 'Stage Hermit',
            date: 'December 12, 2022',
            description: "Had a great experience so far ... However, after the recent update, > Whattsapp calls are taking way too long to connect. ( Both incoming and outgoing ). Haven't checked on video calls though.. > Messaging is working pretty fine. > Also ..while trying to attach a image from within the chat window, ( especially which has been taken as a screenshot ) .. it isn't able to locate the image. Have to go to the gallery and use the Share option on the image. I am using a OnePlus 10 pro",
            thumbnail: 'https://play-lh.googleusercontent.com/a/AEdFTp4ce67C2FB5MGWdSPWgjcej33T0kYNbwYdujreG=s32-rw-mo'
            },
            {
            name: 'David Raju',
            date: 'November 26, 2022',
            description: "I had a very good experience of using this application, and it's so useful for the person to have a conversation with the person who is far from us. just one suggestion you must add up the music in the story uploading so the people who wants to add music with photo or a video can be possible eaasily. Must do this change and update it soon, people will so happy to use it. Thank you very much.As a user of whatsapp , I have seen one of the new features I.e. poll selection. This feature is not so us",
            thumbnail: 'https://play-lh.googleusercontent.com/a-/AD5-WCn-xwvCn4_ZrjAw4-T7El7pBo6z8MxxJmd9DK6x=s32-rw'
            }
        ],
        images_results: [
            {
            src: 'https://play-lh.googleusercontent.com/tNuMAclO_TrRn5RbiSo2iU2ySljFaHjCIWoMUSoemUcl4FjTyVO0PpJZL_zTrYf7v_4=w526-h296-rw'
            },
            {
            src: 'https://play-lh.googleusercontent.com/ijfSGQUCqeCmCQX0w_HjdSWkiYZoFk5JZ5CsxmGI-qT1VPT8V3wGohMBpWZOAp2o7A=w526-h296-rw'
            },
            {
            src: 'https://play-lh.googleusercontent.com/Ck5x7vPWfgXoLvkGqVs5INzV3dzHMYYy4Jr6YVpXDTR-00p_V_kpGABtfXCp9qx10cs=w526-h296-rw'
            },
            {
            src: 'https://play-lh.googleusercontent.com/ef3mz9xoDiwk08KB7B6oN0uSqJkxy8yMBwdOl9TGc3rSsOLdYBQlRZqMCduJjJyeBQ=w526-h296-rw'
            },
            {
            src: 'https://play-lh.googleusercontent.com/8InPqYGQ-28qwt_mLmm6R3VzbMcf3ZSJNUxO_OJosyLRqPHeStZFtjKskgDvHkanfRUJ=w526-h296-rw'
            }
        ]
        }
Enter fullscreen mode Exit fullscreen mode

Here is the complete code:

    const unirest = require("unirest");
    const cheerio = require("cheerio");

    const getGooglePlayData = async() => {

        let url = "https://play.google.com/store/apps/details?id=com.whatsapp"

        let response = await unirest
        .get(url)
        .headers({
            "User-Agent":
            "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
            })
        const $ = cheerio.load(response.body)

        let app_info = {};

        app_info.title = $(".xwcR9d").text();
        app_info.company = $(".auoIOc").text();
        app_info.app_thumbnail = $(".arM4bb").attr("src");
        app_info.rating = parseFloat($(".jILTFe").text().replace("star", ""));
        app_info.reviews = $(".EHUI5b").text().split(" ")[0];
        app_info.downloads = $(".wVqUob:nth-child(2) .ClM7O").text();
        app_info.rated_for = $(".wVqUob~ .wVqUob+ .wVqUob .g1rdde").text().replace("Rated for ", "").replace("info", "")
        app_info.description = $(".bARER").text();
        app_info.developer_info = {};
        app_info.developer_info.website = $(".VVmwY:nth-child(1) .pSEeg").text();
        app_info.developer_info.email = $(".VVmwY:nth-child(2) .pSEeg").text();
        app_info.developer_info.address = $(".VVmwY:nth-child(3) .pSEeg").text();
        app_info.developer_info.privacy_policy = $(".VVmwY:nth-child(4) .pSEeg").text();

        app_info.user_reviews = [];

        $(".EGFGHd").each((i,el) => {
            app_info.user_reviews.push({
                name: $(el).find(".X5PpBb").text(),
                date: $(el).find(".bp9Aid").text(),
                description: $(el).find(".h3YV2d").text(),
                thumbnail: $(el).find("img").attr("src")
            })
        })

        app_info.images_results = [];

        $(".aoJE7e .Atcj9b").each((i,el) => {
            app_info.images_results.push({
                src: $(el).find("img").attr("src")
            })
        })


        console.log(app_info)



    };

    getGooglePlayData();  
Enter fullscreen mode Exit fullscreen mode

Serpdog Google Search API

If you don't want to code and maintain the scraper in the long run and don't want to work with complex URLs and HTML, then you can try this Google Search API.

Serpdog | Google Search API solves all the problem of captchas and proxies and allow developers to scrape Google Search Results smoothly. Also, the pre-cooked structured JSON data can save you a lot of time.

    const axios = require('axios');

    axios.get('https://api.serpdog.io/search?q=coffee&api_key=APIKEY&gl=us')
    .then(response => {
    console.log(response.data);
    })
    .catch(error => {
    console.log(error);
    });
Enter fullscreen mode Exit fullscreen mode

Results:

   {
    "meta": {
        "api_key": "APIKEY",
        "q": "coffee",
        "gl": "us"
    },
    "organic_results": [
        {
        "title": "9 Health Benefits of Coffee, Based on Science - Healthline",
        "link": "https://www.healthline.com/nutrition/top-evidence-based-health-benefits-of-coffee",
        "displayed_link": "https://www.healthline.com › Wellness Topics › Nutrition",
        "snippet": "Coffee is a popular beverage that researchers have studied extensively for its many health benefits, including its ability to increase energy levels, promote ...",
        "rank": 1
        },
        {
        "title": "The Coffee Bean & Tea Leaf | CBTL",
        "link": "https://www.coffeebean.com/",
        "displayed_link": "https://www.coffeebean.com",
        "snippet": "Born and brewed in Southern California since 1963, The Coffee Bean & Tea Leaf® is passionate about connecting loyal customers with carefully handcrafted ...",
        "rank": 2
        },
        {
        "title": "Peet's Coffee: The Original Craft Coffee",
        "link": "https://www.peets.com/",
        "displayed_link": "https://www.peets.com",
        "snippet": "Since 1966, Peet's Coffee has offered superior coffees and teas by sourcing the best quality coffee beans and tea leaves in the world and adhering to strict ...",
        "rank": 3
        },
        {
        "title": "The History of Coffee - National Coffee Association",
        "link": "https://www.ncausa.org/about-coffee/history-of-coffee",
        "displayed_link": "https://www.ncausa.org › ... › History of Coffee",
        "snippet": "Coffee grown worldwide can trace its heritage back centuries to the ancient coffee forests on the Ethiopian plateau. There, legend says the goat herder ...",
        "inline_sitelinks": [
            {
            "title": "An Ethiopian Legend",
            "link": "https://www.ncausa.org/about-coffee/history-of-coffee#:~:text=An%20Ethiopian%20Legend"
            },
            {
            "title": "The Arabian Peninsula",
            "link": "https://www.ncausa.org/about-coffee/history-of-coffee#:~:text=The%20Arabian%20Peninsula,-Coffee%20cultivation%20and%20trade%20began"
            },
            {
            "title": "Coffee Comes To Europe",
            "link": "https://www.ncausa.org/about-coffee/history-of-coffee#:~:text=Coffee%20Comes%20to%20Europe"
            }
        ],
        "rank": 4
        },
        {
        "title": "coffee | Origin, Types, Uses, History, & Facts | Britannica",
        "link": "https://www.britannica.com/topic/coffee",
        "displayed_link": "https://www.britannica.com › ... › Food",
        "snippet": "coffee, beverage brewed from the roasted and ground seeds of the tropical evergreen coffee plants of African origin. Coffee is one of the three most popular ...",
        "rank": 5
        },
        {
        "title": "Starbucks Coffee Company",
        "link": "https://www.starbucks.com/",
        "displayed_link": "https://www.starbucks.com",
        "snippet": "More than just great coffee. Explore the menu, sign up for Starbucks® Rewards, manage your gift card and more.",
        "rank": 6
        },
        {
        "title": "#coffee hashtag on Instagram • Photos and videos",
        "link": "https://www.instagram.com/explore/tags/coffee/",
        "displayed_link": "https://www.instagram.com › explore › tags › coffee",
        "snippet": "156M Posts - See Instagram photos and videos from 'coffee' hashtag.",
        "rank": 7
        }
    ],
   }
Enter fullscreen mode Exit fullscreen mode

Conclusion:

In this tutorial, we learned to scrape Google Play Store App Data with Node JS. Feel free to message me if I missed something. Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Web Scraping Google With Node JS - A Complete Guide
  2. Scrape Google Play Apps Results
  3. Scrape Google Organic Search Results
  4. Scrape Google Shopping Results
  5. Scrape Google Maps Reviews

Author:

My name is Darshan, and I am the founder of serpdog.io. I love to create scrapers. I am currently working for several MNCs to provide them with Google Search Data through a seamless data pipeline.

Top comments (0)