DEV Community

Artur Chukhrai for SerpApi

Posted on • Updated on • Originally published at serpapi.com

Scrape Yelp Filters, Ad and Organic Results with Python

Intro

In this blog post, we'll go through the process of extracting filters, organic and ad results using the Yelp Search Engine Results API and the Python programming language. You can look at the complete code in the online IDE (Replit).

What will be scraped

wwbs-yelp-results

Why using API?

There're a couple of reasons that may use API, ours in particular:

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

serpapi-status-all

Full Code

This code retrieves all the data with pagination:

from serpapi import GoogleSearch
import os, json

params = {
    # https://docs.python.org/3/library/os.html#os.getenv
    'api_key': os.getenv('API_KEY'),    # your serpapi api
    'engine': 'yelp',                   # SerpApi search engine 
    'find_desc': 'Coffee',              # query
    'find_loc': 'New York, NY, USA',    # location
    'start': 0                          # pagination
}

search = GoogleSearch(params)           # where data extraction happens on the SerpApi backend
results = search.get_dict()             # JSON -> Python dict

yelp_results = {
    'filters': results['filters'],
    'ads_results': [],
    'organic_results': []
}

while 'error' not in results:
    yelp_results['ads_results'].extend(results['ads_results'])
    yelp_results['organic_results'].extend(results['organic_results'])

    params['start'] += 10
    results = search.get_dict()

print(json.dumps(yelp_results, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

Preparation

Install library:

pip install google-search-results
Enter fullscreen mode Exit fullscreen mode

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import os, json
Enter fullscreen mode Exit fullscreen mode
Library Purpose
GoogleSearch to scrape and parse Google results using SerpApi web scraping library.
os to return environment variable (SerpApi API key) value.
json to convert extracted data to a JSON object.

The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary:

params = {
    # https://docs.python.org/3/library/os.html#os.getenv
    'api_key': os.getenv('API_KEY'),    # your serpapi api
    'engine': 'yelp',                   # SerpApi search engine 
    'find_desc': 'Coffee',              # query
    'find_loc': 'New York, NY, USA',    # location
    'start': 0                          # pagination
}
Enter fullscreen mode Exit fullscreen mode
Parameters Explanation
api_key Parameter defines the SerpApi private key to use.
engine Set parameter to yelp to use the Yelp API engine.
find_desc Parameter defines the query you want to search. You can use anything that you would use in a regular Yelp search.
find_loc Parameter defines from where you want the search to originate. You can use any location you would use in a regular Yelp search.
start Parameter defines the result offset. It skips the given number of results. It's used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).

๐Ÿ“ŒNote: You can also add other API Parameters.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # data extraction on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict
Enter fullscreen mode Exit fullscreen mode

At the moment, the results dictionary only stores data from 1 page. Before extracting data, the yelp_results dictionary is created where this data will be added later. Since the filters are repeated on each subsequent page, you can extract them immediately:

yelp_results = {
    'filters': results['filters'],
    'ads_results': [],
    'organic_results': []
}
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ŒNote: When SerpApi encounters filters, we add them to our JSON output as the filters object. We are able to extract their text and values. You can use filters to pass values to area parameter l, category parameter cflt, and filters parameter attrs.

To get all results, you need to apply pagination. This is achieved by the following check: while there is no error in the results object of the current page, we extract the data, increase the start parameter by 10 to get the results from next page and update the results object with the new page data:

while 'error' not in results:
    # data extraction from current page will be here

    params['start'] += 10
    results = search.get_dict()
Enter fullscreen mode Exit fullscreen mode

Extending the yelp_results['ads_results'] and yelp_results['organic_results'] list with new data from this page:

yelp_results['ads_results'].extend(results['ads_results'])
yelp_results['organic_results'].extend(results['organic_results'])
# ad_title = results['ads_results'][0]['title']
# ad_link = results['ads_results'][0]['link']
# title = results['organic_results'][0]['title']
# rating = results['organic_results'][0]['rating']
# reviews = results['organic_results'][0]['reviews']
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ŒNote: In the comments above, I showed how to extract specific fields. You may have noticed the results['organic_results'][0]. This is the index of a organic result, which means that we are extracting data from the first organic result. The results['organic_results'][1] is from the second organic result and so on.

After the all data is retrieved, it is output in JSON format:

print(json.dumps(yelp_results, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

Output

{
  "filters": {
    "neighborhoods": {
      "value": "p:NY:New_York:",
      "list": [
        {
          "text": "Lindenwood",
          "value": "Queens:Lindenwood"
        },
        ... other neighborhoods results
      ]
    },
    "distance": [
      {
        "text": "Bird's-eye View",
        "value": "g:-74.09660339355469,40.62750334315296,-73.89198303222656,40.783660996197945"
      },
      ... other distance results
    ],
    "price": [
      {
        "text": "$",
        "value": "RestaurantsPriceRange2.1"
      },
      ... other price results
    ],
    "category": [
      {
        "text": "Wine Bars",
        "value": "wine_bars"
      },
      ... other category results
    ],
    "features": [
      {
        "text": "Accepts Apple Pay",
        "value": "BusinessAcceptsApplePay"
      },
      ... other features results
    ]
  },
  "ads_results": [
    {
      "block_position": "top",
      "place_ids": [
        "tXWkZsgqEnAGhMJNquO7jQ",
        "dunkin-new-york-131"
      ],
      "title": "Dunkinโ€™",
      "link": "https://www.yelp.com/adredir?ad_business_id=tXWkZsgqEnAGhMJNquO7jQ&campaign_id=rgGYFTjiALhOVokQAQZquQ&click_origin=search_results&placement=above_search&placement_slot=0&redirect_url=https%3A%2F%2Fwww.yelp.com%2Fbiz%2Fdunkin-new-york-131&request_id=262d581c288008e6&signature=0b160efed560693840ac4428c671613788a9b899a4e0ffc35372dea545ec734f&slot=0",
      "reviews_link": "https://serpapi.com/search.json?engine=yelp_reviews&place_id=tXWkZsgqEnAGhMJNquO7jQ",
      "categories": [
        {
          "title": "Coffee & Tea",
          "link": "https://www.yelp.com/search?cflt=coffee&find_loc=New+York%2C+NY"
        }
      ],
      "reviews": 23,
      "neighborhoods": "Civic Center",
      "offer_details": {
        "title": "Dunkin' Signature Latte",
        "description": "Espresso-Rich Holiday Signature Lattes"
      },
      "phone": "+1-212-732-0406",
      "service_options": {
        "delivery": true,
        "takeout": true
      },
      "button": {
        "text": "Learn More",
        "link": "https://www.yelp.com/adredir?ad_business_id=tXWkZsgqEnAGhMJNquO7jQ&campaign_id=rgGYFTjiALhOVokQAQZquQ&click_origin=search_results&placement=above_search&placement_slot=0&redirect_url=https%3A%2F%2Fwww.yelp.com%2Fbiz%2Fdunkin-new-york-131&request_id=262d581c288008e6&signature=0b160efed560693840ac4428c671613788a9b899a4e0ffc35372dea545ec734f&slot=0&cta_value=Learn More"
      },
      "thumbnail": "https://s3-media0.fl.yelpcdn.com/offerphoto/EmzkT8VNzsDW0FyXnJu_xQ/ls.jpg"
    },
    ... other ads results
  ],
  "organic_results": [
    {
      "position": 1,
      "place_ids": [
        "ED7A7vDdg8yLNKJTSVHHmg",
        "arabica-brooklyn"
      ],
      "title": "% Arabica",
      "link": "https://www.yelp.com/biz/arabica-brooklyn?osq=Coffee",
      "reviews_link": "https://serpapi.com/search.json?engine=yelp_reviews&place_id=ED7A7vDdg8yLNKJTSVHHmg",
      "categories": [
        {
          "title": "Coffee & Tea",
          "link": "https://www.yelp.com/search?cflt=coffee&find_loc=New+York%2C+NY"
        }
      ],
      "price": "$$",
      "rating": 4.3,
      "reviews": 182,
      "neighborhoods": "Brooklyn Heights",
      "phone": "(718) 865-2551",
      "snippet": "Great coffee had a Spanish latte... can't get over the view! This is the second time we are in Brooklyn and will definitely be back for a 3rd time",
      "service_options": {
        "outdoor_seating": true
      },
      "thumbnail": "https://s3-media0.fl.yelpcdn.com/bphoto/kJkYHT4Q9O5daai-x7paXA/348s.jpg"
    },
    ... other organic results
  ]
}
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ŒNote: Head to the playground for a live and interactive demo.

Join us on Twitter | YouTube

Add a Feature Request๐Ÿ’ซ or a Bug๐Ÿž

Top comments (0)