Today we go to Redfin! This is in the real estate data arena. It is the kind of thing that goes in pair with the post I wrote about scraping real estate auctions. You would get the auction you are looking for and then go to Redfin.com to get some estimated pricing and other data.
Investigation
When scraping a real estate site like this there are really two steps. The first is to be able to leverage an address to find the details page on the site. The second part is more obvious, just scraping the site for the desired data.
Redfin is a modern site and they do live returns of property information as you type. When they do this, it returns something that allows the user to go directly to the details page of this address. This almost for sure means that we can use it to find a way to the details page.
Check it.
On the left you can see the searched data and the exact property discovered. On the right you can see the XHR requests that return the following data:
{}&&{"version":348,"errorMessage":"Success","resultCode":0,"payload":{"sections":[{"rows":[{"id":"1_60647192","type":"1","name":"3950 Callahan Dr","subName":"Memphis, TN, USA","url":"/TN/Memphis/3950-Callahan-Dr-38127/home/60647192","active":true,"claimedHome":false,"invalidMRS":false,"businessMarketIds":[58],"countryCode":"US"}],"name":"Addresses"}],"exactMatch":{"id":"1_60647192","type":"1","name":"3950 Callahan Dr","subName":"Memphis, TN, USA","url":"/TN/Memphis/3950-Callahan-Dr-38127/home/60647192","active":true,"claimedHome":false,"invalidMRS":false,"businessMarketIds":[58],"countryCode":"US"},"extraResults":{},"responseTime":0,"hasFakeResults":false,"isGeocoded":false,"isRedfinServiced":false}}
This data is kind of funny because it’s not quite JSON. Remove that first {}&&
and the rest is valid JSON. And inside…we see a url! Bingo. We’re in business.
With this url, we can go directly to the webpage we are looking for. At the top, what do we find? The property value that we were looking for!
Unfortunately, the details page doesn’t have any XHR requests with property data. The easiest way to confirm this is by looking at the network tab in developer tools and checking the “Doc” tab. If you see the page requested fully rendered then that means it is returning from the server fully fleshed out already.
I’ll just use cheerio for this part and parse the HTML to get the price I’m looking for.
The Code
Pretty simple code execution here. The async block that will handle it all will look like this:
const exampleAddresses = [
'3950 CALLAHAN DR, Memphis, TN 38127',
'17421 Deforest Ave, Cleveland, OH 44128',
'1226 DIVISION AVENUE, San Antonio, TX 78225'
];
(async () => {
for (let i = 0; i < exampleAddresses.length; i++) {
const path = await getUrl(exampleAddresses[i]);
console.log('path', path);
const price = await getPrice(path);
console.log('price', price);
await timeout(2000);
}
})();
You’d loop through your target addresses, get the url (really the path), and use that when you get the price.
async function getUrl(address: string) {
// Location and v are required query parameters
const url = `https://www.redfin.com/stingray/do/location-autocomplete?location=${address}&v=2`;
const axiosResponse = await axios.get(url);
const parsedData = JSON.parse(axiosResponse.data?.replace('{}&&', ''));
return parsedData.payload.exactMatch.url;
}
The above function will get the path from that weird almost JSON. We just get the data and then remove the {}&&
with a replace function.
The getPrice
function is a simple call with axios and parse with cheerio.
async function getPrice(path: string) {
const url = `https://redfin.com${path}`;
const axiosResponse = await axios.get(url);
const $ = cheerio.load(axiosResponse.data);
let price = $('[data-rf-test-id="avm-price"] .statsValue').text();
if (!price) {
price = $('[data-rf-test-id="avmLdpPrice"] .value').text();
}
return price;
}
Bam. And that’s the end. We got ourselves some property prices from Redfin.
Looking for business leads?
Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!
The post Jordan Scrapes Redfin appeared first on Javascript Web Scraping Guy.
Top comments (0)