Artur Chukhrai for SerpApi

Posted on Jan 29, 2023 • Updated on Feb 6, 2023 • Originally published at serpapi.com

Using Walmart Search Engine Results API from SerpApi

#webscraping #tutorial #python #programming

Intro
What will be scraped
Why using API?
Full Code
Preparation
Code Explanation
Output
Links

Intro

In this blog post, we'll go through the process of extracting filters, featured items, related queries and organic results plus pagination using the Walmart Search Engine Results API and the Python programming language.

You can look at the complete code in the online IDE (Replit).

What will be scraped

📌Note: By default, Walmart returns 40 results. In this case, 8 results are displayed to make the image more compact.

Why using API?

There're a couple of reasons that may use API, ours in particular:

No need to create a parser from scratch and maintain it.
Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
Pay for proxies, and CAPTCHA solvers.
Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

Full Code

This code retrieves all the data with pagination:

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json

params = {
    'api_key': '...',           # https://serpapi.com/manage-api-key
    'engine': 'walmart',        # SerpApi search engine 
    'query': 'coffee marker',   # the search query
    'spelling': True,           # activate spelling fix
    'sort': 'best_match',       # sorted by different options
    'min_price': 100,           # minimum price
    'max_price': 150,           # maximum price
}

search = GoogleSearch(params)   # where data extraction happens on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

walmart_results = {
    'search_information': results.get('search_information'),
    'filters': results.get('filters'),
    'organic_results': [],
    'featured_item': results.get('featured_item'),
    'related_queries': results.get('related_queries'),
}

while 'next' in results.get('serpapi_pagination', {}):
    # add data from current page
    walmart_results['organic_results'].extend(results['organic_results'])

    # update search object
    search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))

    # get updated information from next page
    results = search.get_dict()

print(json.dumps(walmart_results, indent=2, ensure_ascii=False))

Preparation

Install library:

pip install google-search-results

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json

Library	Purpose
`GoogleSearch`	to scrape and parse Google results using SerpApi web scraping library.
`urlsplit`	this should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.
`parse_qsl`	to parse a query string given as a string argument.
`json`	to convert extracted data to a JSON object.

The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary:

params = {
    'api_key': '...',           # https://serpapi.com/manage-api-key
    'engine': 'walmart',        # SerpApi search engine 
    'query': 'coffee marker',   # the search query
    'spelling': True,           # activate spelling fix
    'sort': 'best_match',       # sorted by different options
    'min_price': 100,           # minimum price
    'max_price': 150,           # maximum price
}

Parameters	Explanation
`api_key`	Parameter defines the SerpApi private key to use. You can find it under your account -> API key
`engine`	Set parameter to `walmart` to use the Walmart API engine.
`query`	Parameter defines the search query. You can use anything that you would use in a regular Walmart search.
`spelling`	Activate spelling fix. `True` (default) includes spelling fix, `False` searches without spelling fix.
`sort`	Parameter defines sorting. (e.g. `price_low`, `price_high`, `best_seller`, `best_match`, `rating_high`, `new`)
`min_price`	Lower bound of price range query.
`max_price`	Upper bound of price range query.

📌Note: You can also add other API Parameters.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # data extraction on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

You may have noticed that I made a mistake when passing the value to the q parameter. This was done on purpose to demonstrate that SerpApi's Walmart Spell Check API allows you to extract the corrected search term and search it:

print(results['search_information']['spelling_fix'])    # coffee marker

At the moment, the results dictionary only stores data from 1 page. Before extracting data, the walmart_results dictionary is created where this data will be added later. Since the search_information, filters, featured_item and related_queries are repeated on each subsequent page, you can extract them immediately:

walmart_results = {
    'search_information': results.get('search_information'),
    'filters': results.get('filters'),
    'organic_results': [],
    'featured_item': results.get('featured_item'),
    'related_queries': results.get('related_queries'),
}

To get all organic results, you need to apply Walmart Pagination API. This is achieved by the following check: while the next page exists in the serpapi_pagination dictionary, we fetch the data from the current page, update the JSON data in the search object, and get updated information about the next page:

while 'next' in results.get('serpapi_pagination', {}):
    # add data from current page
    # ...

    # update search object
    search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))

    # get updated information from next page
    results = search.get_dict()

Extending the walmart_results['organic_results'] list with new data from this page:

# add data from current page
walmart_results['organic_results'].extend(results['organic_results'])

# title = results['organic_results'][0]['title']
# thumbnail = results['organic_results'][0]['thumbnail']
# rating = results['organic_results'][0]['rating']
# reviews = results['organic_results'][0]['reviews']
# price = results['organic_results'][0]['primary_offer']['offer_price']

📌Note: In the comments above, I showed how to extract specific fields. You may have noticed the results['organic_results'][0]. This is the index of a product, which means that we are extracting data from the first product. The results['organic_results'][1] is from the second product and so on.

After the all data is retrieved, it is output in JSON format:

print(json.dumps(walmart_results, indent=2, ensure_ascii=False))

Output

{
  "search_information": {
    "location": {
      "postal_code": "60602",
      "province_code": "IL",
      "city": "Chicago",
      "store_id": "5402"
    },
    "total_results": 152051,
    "query_displayed": "coffee marker",
    "organic_results_state": "Results for exact spelling",
    "spelling_fix": "coffee maker"
  },
  "filters": null,
  "organic_results": [
    {
      "us_item_id": "622343372",
      "product_id": "363IFK4JZENM",
      "title": "Nespresso Vertuo Plus Coffee and Espresso Maker by De'Longhi, Black",
      "thumbnail": "https://i5.walmartimages.com/asr/b80b2bf3-f47c-494d-be9c-bd5b548760f9.b4bcbb88b02aaef77b5df4c697c22ab4.jpeg?odnHeight=180&odnWidth=180&odnBg=FFFFFF",
      "rating": 4.7,
      "reviews": 1603,
      "seller_id": "F55CDC31AB754BB68FE0B39041159D63",
      "seller_name": "Walmart.com",
      "fulfillment_badges": [
        "3+ day shipping"
      ],
      "two_day_shipping": false,
      "out_of_stock": false,
      "sponsored": true,
      "muliple_options_available": false,
      "primary_offer": {
        "offer_id": "8952A2034C634B9C9166D9A720E1DC5B",
        "offer_price": 127,
        "min_price": 0
      },
      "price_per_unit": {
        "unit": "each",
        "amount": ""
      },
      "product_page_url": "https://www.walmart.com/ip/Nespresso-Vertuo-Plus-Coffee-and-Espresso-Maker-by-De-Longhi-Black/622343372?athbdg=L1800",
      "serpapi_product_page_url": "https://serpapi.com/search.json?device=desktop&engine=walmart_product&product_id=622343372"
    },
    ... other results
  ],
  "featured_item": null,
  "related_queries": null
}

📌Note: Head to the playground for a live and interactive demo.

Links

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞

DEV Community

Using Walmart Search Engine Results API from SerpApi

Intro

What will be scraped

Why using API?

Full Code

Preparation

Code Explanation

Output

Links

Top comments (0)

Read next

Comment choisir le bon design pattern en Python, avec des exemples

Finding the second highest salary in Oracle SQL

Setting Up and Exploring Django's Admin Panel

How are responsive websites doing in 2024?