Realtor is one of the most popular and second-largest real estate listing platforms in the United States. Scraping real estate data can help you perform proper market research before deciding. It can also be utilized to identify potential future trends in the market and adjust your positions accordingly.
Extracting data from Realtor.com is easy, and in this article, we will take advantage of Python and its dedicated libraries for scraping real estate data to scrape all the property sets available on the target web page.
If you are a beginner and want to get a more profound language on web scraping, please check out this guide: Web Scraping With Python
Requirements
Before scraping Realtor.com, we need to install some libraries to proceed with this tutorial. I assume you have already installed the latest version of Python on your device.
You can start by creating a new directory to store our scraping files:
mkdir realtor_scraper
Next, we will create a new Python file in our folder to deal with our scraping operations.
Then, install the libraries with the following command.
pip install requests
pip install beautifulsoup4
Requests — To extract the HTML data from the Zillow website.
Beautiful Soup — For parsing the extracted HTML data.
What To Scrape From Realtor.com
It is good practice to decide what you want to scrape from the website in advance. In this tutorial, we will be extracting the following data points from the target page:
Address
Bath Count
Bed Count
Sqft
Plot Size
Pricing
Scraping Property Data From Realtor
We will rely on BeautifulSoup select and select_one methods to access the DOM elements. Before we start coding our scraper, we need to understand the HTML structure of the web page.
You can easily do this by right-clicking on any element of interest, which will open a menu from which you need to select the Inspect button. This will open the developer's tools panel on your screen, which can be used to identify the tags containing the required information.
From the above image, we can conclude that all the properties are stored inside the class BasePropertyCard_propertyCardWrap__J0xUj of the div tag.
Let us first start with extracting the HTML data.
import requests
from bs4 import BeautifulSoup
l=list()
obj={}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
}
resp = requests.get("https://www.realtor.com/realestateandhomes-search/Los-Angeles_CA/type-single-family-home,multi-family-home", headers=headers).text
soup = BeautifulSoup(resp,'html.parser')
Step-by-step explanation:
In the first two lines, we imported the Requests and BeautifulSoup library.
Next, we declare two variables to store the data.
It is important to make our bot mimic a humanoid visitor, so we initialized our header with User Agent, which will be passed with a GET request.
Then, with the help of Requests we made an HTTP connection on the target URL.
Finally, we created an instance of BeautifulSoup to navigate through the HTML and obtain the required information.
Now, we will use the select method to capture all the elements with the class BasePropertyCard_propertyCardWrap__J0xUj
.
for el in soup.select(".BasePropertyCard_propertyCardWrap__J0xUj"):
This for loop will allow us to iterate over all the elements with the given class and extract the data present inside each listing.
Let us now locate the tags for each data point we discussed above.
In the above image, we can see that the pricing of the property is contained inside the class price-wrapper of the div tag. Now, inside the for loop, add the following code.
try:
obj["pricing"]=el.select_one(".price-wrapper div[data-testid=card-price]").text
except:
obj["pricing"]=None
Then, we will get the bed, bath, and sqft information from the HTML page.
The above image shows that all this information is stored inside individual li tags, with different data-testid
attributes. The bed property has the attribute data-testid=property-meta-beds
, and similarly, the bath property has the attribute data-testid=property-meta-baths
.
Copy the following code to extract this information.
try:
obj["bed"]=el.select_one("li[data-testid=property-meta-beds]").text
except:
obj["bed"]=None
try:
obj["bath"]=el.select_one("li[data-testid=property-meta-baths]").text
except:
obj["bath"]=None
try:
obj["sqft"]=el.select_one("li[data-testid=property-meta-sqft]").find_next().text
except:
obj["sqft"]=None
try:
obj["plot_size"]=el.select_one("li[data-testid=property-meta-lot-size]").find_next().text
except:
obj["plot_size"]=None
But, there might be some cases where you won’t find every category of text in the given position. That is why we are using try and except to escape if any error occurs.
Finally, we are left with the address data point. The process of finding the address will be the same as the methods we have previously followed. As you have learned now, you can try to extract the property address yourself.
So, we can get the address inside the div tag with the attribute card-address-1
and card-address-2
.
You can try the following code to extract the address.
try:
obj["address"]=el.select_one("div[data-testid=card-address-1]").text + " " + el.select_one("div[data-testid=card-address-2]").text
except:
obj["address"]=None
So, we are done with extracting each data point. We will then append this object into the l array to store each property data.
l.append(obj)
obj={}
print(l)
Execute the program in your project terminal. You will get the following results:
[
{
'pricing': '$185,000,000',
'bed': '10bed',
'bath': '14.5+bath',
'sqft': '34,380sqft',
'plot_size': '2.65acre lot',
'address': '869 Tione Rd Los Angeles, CA 90077'
},
{
'pricing': '$54,995,000$5M',
'bed': '9bed',
'bath': '18bath',
'sqft': '21,000sqft',
'plot_size': '3.6acre lot',
'address': '10066 Cielo Dr Beverly Hills, CA 90210'
},
{
'pricing': '$9,500,000$1.5M',
'bed': '5bed',
'bath': '7bath',
'sqft': '9,375sqft',
'plot_size': '0.74acre lot',
'address': '13320 Mulholland Dr Beverly Hills, CA 90210'
}
....
Complete Code:
You can make some changes to this code according to your needs. For example, extracting images and links and implementing pagination. But for now, our code will look like this:
import requests
from bs4 import BeautifulSoup
l=list()
obj={}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
}
resp = requests.get("https://www.realtor.com/realestateandhomes-search/Los-Angeles_CA/type-single-family-home,multi-family-home", headers=headers).text
soup = BeautifulSoup(resp,'html.parser')
for el in soup.select(".BasePropertyCard_propertyCardWrap__J0xUj"):
try:
obj["pricing"]=el.select_one(".price-wrapper div[data-testid=card-price]").text
except:
obj["pricing"]=None
try:
obj["bed"]=el.select_one("li[data-testid=property-meta-beds]").text
except:
obj["bed"]=None
try:
obj["bath"]=el.select_one("li[data-testid=property-meta-baths]").text
except:
obj["bath"]=None
try:
obj["sqft"]=el.select_one("li[data-testid=property-meta-sqft]").find_next().text
except:
obj["sqft"]=None
try:
obj["plot_size"]=el.select_one("li[data-testid=property-meta-lot-size]").find_next().text
except:
obj["plot_size"]=None
try:
obj["address"]=el.select_one("div[data-testid=card-address-1]").text + " " + el.select_one("div[data-testid=card-address-2]").text
except:
obj["address"]=None
l.append(obj)
obj={}
print(l)
Scraping Realtor Using Serpdog
If you continue extracting data from Realtor, your IP may get blocked, and you will encounter the CAPTCHA screen every time you visit their site.
To avoid blocking, you can use Serpdog’s Web Scraping API to scrape data from any website. Serpdog is backed by a massive amount of rotating residential and data center proxies, allowing businesses to bypass any CAPTCHA and focus on the data extraction and product development part.
You can register on Serpdog to claim your 1000 free credits to start scraping Realtor.com without getting blocked.
Let’s see how you can use Serpdog.
After successfully signing up, you will be redirected to our dashboard, where you will get your API Key.
Copy the API key from the dashboard and embed this in the below code to scrape data from Realtor quickly and easily.
import requests
from bs4 import BeautifulSoup
l=list()
obj={}
resp = requests.get("https://api.serpdog.io/scrape?api_key=APIKEY&url=https://www.realtor.com/realestateandhomes-search/Los-Angeles_CA/type-single-family-home,multi-family-home&render_js=false").text
soup = BeautifulSoup(resp,'html.parser')
for el in soup.select(".BasePropertyCard_propertyCardWrap__J0xUj"):
try:
obj["pricing"]=el.select_one(".price-wrapper div[data-testid=card-price]").text
except:
obj["pricing"]=None
try:
obj["bed"]=el.select_one("li[data-testid=property-meta-beds]").text
except:
obj["bed"]=None
try:
obj["bath"]=el.select_one("li[data-testid=property-meta-baths]").text
except:
obj["bath"]=None
try:
obj["sqft"]=el.select_one("li[data-testid=property-meta-sqft]").find_next().text
except:
obj["sqft"]=None
try:
obj["plot_size"]=el.select_one("li[data-testid=property-meta-lot-size]").find_next().text
except:
obj["plot_size"]=None
try:
obj["address"]=el.select_one("div[data-testid=card-address-1]").text + " " + el.select_one("div[data-testid=card-address-2]").text
except:
obj["address"]=None
l.append(obj)
obj={}
print(l)
If any API call fails, you will not be charged for it.
Conclusion
In this tutorial, we learned to scrape property data from Realtor.com including pricing, address, and other features. If your demand grows, and you want to extract more data, you can consider our web scraping API that features proxy rotation and headless browsers.
I hope this tutorial gave you a basic overview to scrape Realtor.com using Python.
If you think we can complete your web scraping tasks and help you collect data, please don’t hesitate to contact us.
Please do not hesitate to message me if I missed something. Follow me on Twitter. Thanks for reading!
Frequently Asked Questions
Q1. Is it legal to scrape Realtor.com?
Yes, it is legal to scrape Realtor.com, as the property data is publicly available, and we are not scraping any personal information which may raise any questions.
Q2. Does Realtor provide any free API?
No, Realtor.com do not provide any free API, but you can try Serpdog’s Web Scraping API, which offers 1k free requests credits to its users on registration on its website.
Additional Resources
I have prepared a complete list of blogs to learn web scraping that can give you an idea and help you in your web scraping journey.
Top comments (1)
Web scraping Realtor.com allows for the extraction of real estate data, including property listings, prices, and descriptions, from the platform's website. This data can be invaluable for realtor in pensacola fl, as it provides insights into the local housing market, trends, and competitive pricing strategies. By utilizing web scraping techniques, realtors can gather up-to-date information on available properties, analyze market dynamics, and tailor their services to meet the needs of buyers and sellers in Pensacola. Additionally, web scraping can streamline the process of market research, lead generation, and property valuation, enabling realtors to make informed decisions and deliver exceptional service to their clients.
3.5