DEV Community

Illia Zub for SerpApi

Posted on • Originally published at serpapi.com

Scrape Walmart Search for a specific store

Walmart responds with results for Sacramento for requests outside of the US.

image

But how to search for products that are available in a specific store? Any store on Walmart can be chosen without browser automation — only by setting relevant cookies in the plain HTTP request.

To figure out on your own, JS and browser dev tools knowledge will be enough. Some Ruby knowledge is required to understand this post.

Location cookies

I've updated location several times and checked the browser Dev Tools -> Application -> Cookies.

image

There are several cookies being updated after choosing a different location: locGuestData, locDataV3, assortmentStoreId; ACID, hasACID, hasLocData.

location-data also looks relevant but it contains postal code and address for a store I haven't chosen. Maybe it was used before Walmart migrated to GrapgQL API.

locDataV3 and locGuestData are Base64 and URI-encoded JSON objects. locDataV3 contains more data than locGuestData. But data of locGuestData can be used for both.

ACID is a UUID. It can be generated on the client.

hasACID and hasLocData are flags.

Understanding locGuestData

Let's check what's inside this cookie value to understand how to set the store ID.

Example of encoded locGuestData

When sending requests to Walmart, locGuestData is a Base64-encoded string.

eyJpbnRlbnQiOiJTSElQUElORyIsInN0b3JlSW50ZW50IjoiUElDS1VQIiwibWVyZ2VGbGFnIjp0cnVlLCJwaWNrdXAiOnsibm9kZUlkIjoiNDExNSIsInRpbWVzdGFtcCI6MTYzNzMyODUwMDUyM30sInBvc3RhbENvZGUiOnsiYmFzZSI6Ijc4MTU0IiwidGltZXN0YW1wIjoxNjM3MzI4NTAwNTIzfSwidmFsaWRhdGVLZXkiOiJwcm9kOnYyOjUyNzNlMDFjLTA4NzAtNGUwOS05ODU4LTAzYTI2ZDQ5N2ZhOSJ9
Enter fullscreen mode Exit fullscreen mode

Example of decoded locGuestData

This Base64 string is a encoded JSON object.

JSON.parse(decodeURIComponent(atob("eyJpbnRlbnQiOiJTSElQUElORyIsInN0b3JlSW50ZW50IjoiUElDS1VQIiwibWVyZ2VGbGFnIjp0cnVlLCJwaWNrdXAiOnsibm9kZUlkIjoiNDExNSIsInRpbWVzdGFtcCI6MTYzNzMyODUwMDUyM30sInBvc3RhbENvZGUiOnsiYmFzZSI6Ijc4MTU0IiwidGltZXN0YW1wIjoxNjM3MzI4NTAwNTIzfSwidmFsaWRhdGVLZXkiOiJwcm9kOnYyOjUyNzNlMDFjLTA4NzAtNGUwOS05ODU4LTAzYTI2ZDQ5N2ZhOSJ9")))

{
    "intent": "SHIPPING",
    "storeIntent": "PICKUP",
    "mergeFlag": true,
    "pickup": {
        "nodeId": "4115",
        "timestamp": 1637328500523
    },
    "postalCode": {
        "base": "78154",
        "timestamp": 1637328500523
    },
    "validateKey": "prod:v2:5273e01c-0870-4e09-9858-03a26d497fa9"
}
Enter fullscreen mode Exit fullscreen mode

After changing Walmart store several times, I've seen that nodeId and postalCode.base are changing.

Generate timestamp and acid for locGuestData

timestamp and acid can be generated on every request.

timestamp = Time.now.to_i

acid = SecureRandom.uuid
Enter fullscreen mode Exit fullscreen mode

Base64-encode location data

Next, let's Base64-encode that JSON string as Walmart expects.

timestamp = Time.now.to_i

acid = SecureRandom.uuid

location_guest_data = {
  intent: "SHIPPING",
  storeIntent: "PICKUP",
  mergeFlag: true,
  pickup: {
    nodeId: store_id,
    timestamp: timestamp
  },
  postalCode: {
    base: postal_code,
    timestamp: timestamp
  },
  validateKey: "prod:v2:#{acid}"
}

encoded_location_data = Base64.urlsafe_encode64(JSON.dump(location_guest_data))
Enter fullscreen mode Exit fullscreen mode

Create cookie string

Finally, a location cookie string contains all the required fields.

%(ACID=#{acid}; hasACID=true; hasLocData=1; locDataV3=#{location_guest_data}; assortmentStoreId=#{store_id}; locGuestData=#{encoded_location_data})
Enter fullscreen mode Exit fullscreen mode

Complete function to create Walmart location cookie

Putting all together.

def location_cookie(store_id, postal_code)
  return if store_id.blank?

  timestamp = Time.now.to_i

  acid = SecureRandom.uuid

  location_guest_data = {
    intent: "SHIPPING",
    storeIntent: "PICKUP",
    mergeFlag: true,
    pickup: {
      nodeId: store_id,
      timestamp: timestamp
    },
    postalCode: {
      base: postal_code,
      timestamp: timestamp
    },
    validateKey: "prod:v2:#{acid}"
  }

  encoded_location_data = Base64.urlsafe_encode64(JSON.dump(location_guest_data))

  %(ACID=#{acid}; hasACID=true; hasLocData=1; locDataV3=#{location_guest_data}; assortmentStoreId=#{store_id}; locGuestData=#{encoded_location_data})
end
Enter fullscreen mode Exit fullscreen mode

Then make an HTTP request using the language and libraries you've chosen.

import got from 'got';

const STORE_ID = "4115";
const POSTAL_CODE = "78154";

const locationCookie = getLocationCookie(STORE_ID, POSTAL_CODE);

const htmlResponse = await got('https://www.walmart.com/search?q=cookie', {
  headers: {
    cookie: locationCookie
  }
});
Enter fullscreen mode Exit fullscreen mode

image

Where to get store ID and postal code

Well, but we wouldn't hard-code store ID and postal code into the web scraping program. A CSV of 4.6k stores can be used to find and store ID dynamically.

Programmatic usage of CSV is out of the scope of this post. All that is needed is to read find store ID and postal code for a specific location in a table.

Updating a list of Walmart stores IDs and locations

Walmart provides several sources to find stores. Data can be populated from one of those sources:

Store Directory

Store Directory contains links on four levels: country, states, cities, and stores. To get the data, iterate over all elements on the specific level and make subsequent requests.

States

Assuming the country is the US, 51 states can be hard-coded. Walmart front-end requests data from the JSON endpoint https://www.walmart.com/store/electrode/api/store-directory. It accepts the st search parameter.

Example: https://www.walmart.com/store/electrode/api/store-directory?st=AL.

It returns a list of cities. Each city object contains city, and storeId or storeCount. The city with storeId contains a single store. The city with storeCount contains multiple stores.

Single store in a city

Request to a specific store returns an HTML page. Example: https://www.walmart.com/store/5744.

image

Store address and postal code should be extracted from the HTML. Store ID is already in URI.

let postalCode = document.querySelector(".store-address-postal[itemprop=postalCode]").textContent;
let address = document.querySelector(".store-address[itemprop=address]").textContent;
Enter fullscreen mode Exit fullscreen mode
Multiple stores in a city

Request for multiple stores returns a JSON response. Cities with a single store respond with an empty array ([]) so we have to parse HTML.

Example request for multiple stores

https://www.walmart.com/store/electrode/api/store-directory?st=AL&city=Decatur
Enter fullscreen mode Exit fullscreen mode

Sample city from the response

{
  "displayName": "Neighborhood Market",
  "storeName": "Neighborhood Market",
  "address": "1203 6th Ave Se",
  "phone": "256-822-6366",
  "postalCode": "35601",
  "storeId": 2488
}
Enter fullscreen mode Exit fullscreen mode
Putting all together

Pseudo-code to collect store IDs and locations for all US states.

const STATES = ["AL", "TX", "CA", /* ... */];

let walmartStores = [];

for (let state of STATES) {
  let cities = get(`https://www.walmart.com/store/electrode/api/store-directory?st=${state}`);

  for (let { storeId, storeCount, city } of cities) {
    if (storeId && !storeCount) {
      let store = get(`https://www.walmart.com/store/${storeId}`);

      let document = parseHTML(store);

      let postalCode = document.querySelector(".store-address-postal[itemprop=postalCode]").textContent;
      let address = document.querySelector(".store-address[itemprop=address]").textContent;

      walmartStores.push({ postalCode, address, storeId: storeId });
    } else if (!storeId && storeCount > 0) {
      let stores = get(`https://www.walmart.com/store/electrode/api/store-directory?st=${state}&city=${city}`);

      walmartStores.concat(stores);
    }
  }
}

csv.write("walmart_stores.csv", walmartStores);
Enter fullscreen mode Exit fullscreen mode

Existing programs to scrape Walmart Stores

Search on GitHub via grep.app shows four relevant repositories

$ curl -s https://raw.githubusercontent.com/akamai/edgeworkers-examples/master/edgecompute/examples/personalization/storelocator/data/locations.json | jq '.elements[].tags | select(."ref:walmart" != null) | .ref' | wc -l
471
Enter fullscreen mode Exit fullscreen mode
  • scrapehero/walmart_store_locator which scrapes stores by postal codes. But finding a list of actual postal codes turned out to be harder than finding a list of actual US states.

  • theriley106/WaltonAnalytics which is great to extract data from Walmart but not Walmart stores.

  • GUI/covid-vaccine-spotter which scrapes stores by postal codes. But finding a list of actual postal codes turned out to be harder than finding a list of actual US states.

So, I've played with Rust and came up with this (rough) program.

After going through compilation errors, it worked well. Thanks to this helpful blog post about async streams in Rust. Every time my program compiled, it actually worked. Fixing compilation errors is hard (for non-rustacean) but there's was no need to debug the program in runtime which is great.

Conclusion

Scraping Walmart is fairly easy — it contains inline JSON data for all products on the search results page.

image

Update location cookies to specify the location for plain HTTP requests to Walmart.

If you have anything to share, any questions, suggestions, or something that isn't working correctly, feel free to drop a comment in the comment section or reach out via Twitter at @ilyazub_, or @serp_api.

Yours,
Ilya, and the rest of the SerpApi Team.


Join us on Reddit | Twitter | YouTube

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.