Intro
In this blog post, we'll go through the process of extracting filters, featured items, related queries and organic results plus pagination using the Walmart Search Engine Results API and the Python programming language.
You can look at the complete code in the online IDE (Replit).
What will be scraped
πNote: By default, Walmart returns 40 results. In this case, 8 results are displayed to make the image more compact.
Why using API?
There're a couple of reasons that may use API, ours in particular:
- No need to create a parser from scratch and maintain it.
- Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation.
SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.
Full Code
This code retrieves all the data with pagination:
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'walmart', # SerpApi search engine
'query': 'coffee marker', # the search query
'spelling': True, # activate spelling fix
'sort': 'best_match', # sorted by different options
'min_price': 100, # minimum price
'max_price': 150, # maximum price
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
walmart_results = {
'search_information': results.get('search_information'),
'filters': results.get('filters'),
'organic_results': [],
'featured_item': results.get('featured_item'),
'related_queries': results.get('related_queries'),
}
while 'next' in results.get('serpapi_pagination', {}):
# add data from current page
walmart_results['organic_results'].extend(results['organic_results'])
# update search object
search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))
# get updated information from next page
results = search.get_dict()
print(json.dumps(walmart_results, indent=2, ensure_ascii=False))
Preparation
Install library:
pip install google-search-results
google-search-results
is a SerpApi API package.
Code Explanation
Import libraries:
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json
Library | Purpose |
---|---|
GoogleSearch |
to scrape and parse Google results using SerpApi web scraping library. |
urlsplit |
this should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted. |
parse_qsl |
to parse a query string given as a string argument. |
json |
to convert extracted data to a JSON object. |
The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params
dictionary:
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'walmart', # SerpApi search engine
'query': 'coffee marker', # the search query
'spelling': True, # activate spelling fix
'sort': 'best_match', # sorted by different options
'min_price': 100, # minimum price
'max_price': 150, # maximum price
}
Parameters | Explanation |
---|---|
api_key |
Parameter defines the SerpApi private key to use. You can find it under your account -> API key |
engine |
Set parameter to walmart to use the Walmart API engine. |
query |
Parameter defines the search query. You can use anything that you would use in a regular Walmart search. |
spelling |
Activate spelling fix. True (default) includes spelling fix, False searches without spelling fix. |
sort |
Parameter defines sorting. (e.g. price_low , price_high , best_seller , best_match , rating_high , new ) |
min_price |
Lower bound of price range query. |
max_price |
Upper bound of price range query. |
πNote: You can also add other API Parameters.
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the results
dictionary we get data from JSON:
search = GoogleSearch(params) # data extraction on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
You may have noticed that I made a mistake when passing the value to the q
parameter. This was done on purpose to demonstrate that SerpApi's Walmart Spell Check API allows you to extract the corrected search term and search it:
print(results['search_information']['spelling_fix']) # coffee marker
At the moment, the results
dictionary only stores data from 1 page. Before extracting data, the walmart_results
dictionary is created where this data will be added later. Since the search_information
, filters
, featured_item
and related_queries
are repeated on each subsequent page, you can extract them immediately:
walmart_results = {
'search_information': results.get('search_information'),
'filters': results.get('filters'),
'organic_results': [],
'featured_item': results.get('featured_item'),
'related_queries': results.get('related_queries'),
}
To get all organic results, you need to apply Walmart Pagination API. This is achieved by the following check: while the next page exists in the serpapi_pagination
dictionary, we fetch the data from the current page, update the JSON data in the search
object, and get updated information about the next page:
while 'next' in results.get('serpapi_pagination', {}):
# add data from current page
# ...
# update search object
search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))
# get updated information from next page
results = search.get_dict()
Extending the walmart_results['organic_results']
list with new data from this page:
# add data from current page
walmart_results['organic_results'].extend(results['organic_results'])
# title = results['organic_results'][0]['title']
# thumbnail = results['organic_results'][0]['thumbnail']
# rating = results['organic_results'][0]['rating']
# reviews = results['organic_results'][0]['reviews']
# price = results['organic_results'][0]['primary_offer']['offer_price']
πNote: In the comments above, I showed how to extract specific fields. You may have noticed the results['organic_results'][0]
. This is the index of a product, which means that we are extracting data from the first product. The results['organic_results'][1]
is from the second product and so on.
After the all data is retrieved, it is output in JSON format:
print(json.dumps(walmart_results, indent=2, ensure_ascii=False))
Output
{
"search_information": {
"location": {
"postal_code": "60602",
"province_code": "IL",
"city": "Chicago",
"store_id": "5402"
},
"total_results": 152051,
"query_displayed": "coffee marker",
"organic_results_state": "Results for exact spelling",
"spelling_fix": "coffee maker"
},
"filters": null,
"organic_results": [
{
"us_item_id": "622343372",
"product_id": "363IFK4JZENM",
"title": "Nespresso Vertuo Plus Coffee and Espresso Maker by De'Longhi, Black",
"thumbnail": "https://i5.walmartimages.com/asr/b80b2bf3-f47c-494d-be9c-bd5b548760f9.b4bcbb88b02aaef77b5df4c697c22ab4.jpeg?odnHeight=180&odnWidth=180&odnBg=FFFFFF",
"rating": 4.7,
"reviews": 1603,
"seller_id": "F55CDC31AB754BB68FE0B39041159D63",
"seller_name": "Walmart.com",
"fulfillment_badges": [
"3+ day shipping"
],
"two_day_shipping": false,
"out_of_stock": false,
"sponsored": true,
"muliple_options_available": false,
"primary_offer": {
"offer_id": "8952A2034C634B9C9166D9A720E1DC5B",
"offer_price": 127,
"min_price": 0
},
"price_per_unit": {
"unit": "each",
"amount": ""
},
"product_page_url": "https://www.walmart.com/ip/Nespresso-Vertuo-Plus-Coffee-and-Espresso-Maker-by-De-Longhi-Black/622343372?athbdg=L1800",
"serpapi_product_page_url": "https://serpapi.com/search.json?device=desktop&engine=walmart_product&product_id=622343372"
},
... other results
],
"featured_item": null,
"related_queries": null
}
πNote: Head to the playground for a live and interactive demo.
Links
- Code in the online IDE
- Walmart Search Engine Results API
- Walmart Featured Item API
- Walmart Filters API
- Walmart Organic Results API
- Walmart Pagination API
- Walmart Related Queries API
- Walmart Spell Check API
Add a Feature Requestπ« or a Bugπ
Top comments (0)