Side Projects (4 Part Series)
For all prospective Airbnb hosts in Seattle, I will answer these questions in this article:
- when to rent to maximise revenue?
- when is the off-peak season for maintenance?
- common group size of Seattle travellers, is it 2 or family or 4 or larger?
- bedroom configurations to maximise booking rates?
- how to achieve a good rating?
- do hosts with higher rating have higher revenue?
- amenities to include?
In this article, I will perform exploratory data analysis on the Airbnb dataset gotten from Inside Airbnb.
Our data will be loaded in pandas, comma-separated values (CSV) files can be easily loaded into DataFrame with the
Let us look at what the first 10 rows looks like with
And examine the summary of the numerical data with
- there are 3813 listings in this dataset
- values in the price column contain the dollar symbol ($)
- there are missing values in columns bathrooms, bedrooms, and beds there are missing values in reviews rating columns (review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_checkin, review_scores_communication, review_scores_location, - review_scores_value)
The column price, which is the price of the listing, it contains the dollar sign ($). We still can’t use it for analysis as it is not a numerical value, so we remove the dollar symbol and convert the values as numeric values:
pd_listings['price'] = pd_listings['price'].str.replace("[$, ]", "").astype("float")
Then replace those empty values with zero:
pd_listings.at[pd_listings['bathrooms'].isnull(), 'bathrooms'] = 0 pd_listings.at[pd_listings['bedrooms'].isnull(), 'bedrooms'] = 0 pd_listings.at[pd_listings['beds'].isnull(), 'beds'] = 0 pd_listings.at[pd_listings['review_scores_rating'].isnull(), 'review_scores_rating'] = 0 pd_listings.at[pd_listings['review_scores_accuracy'].isnull(), 'review_scores_accuracy'] = 0 pd_listings.at[pd_listings['review_scores_cleanliness'].isnull(), 'review_scores_cleanliness'] = 0 pd_listings.at[pd_listings['review_scores_checkin'].isnull(), 'review_scores_checkin'] = 0 pd_listings.at[pd_listings['review_scores_communication'].isnull(), 'review_scores_communication'] = 0 pd_listings.at[pd_listings['review_scores_location'].isnull(), 'review_scores_location'] = 0 pd_listings.at[pd_listings['review_scores_value'].isnull(), 'review_scores_value'] = 0
Lastly, to rename id to listing_id:
Let us load another CSV file which contains the reviews for each listing. The DataFrame contains the following columns:
- id — identification number for review
- listing_id — identification number for listing which we can join with the above DataFrame
- date — date of the review
I suppose that each review is a successful booking and guests stayed some number of nights. Unfortunately, we do not know the exact number of nights each guest stayed, but we could use the listing’s minimum_nights, to assume each guest stayed at least that minimum number of nights. For each review, price * minimum_nights to get each booking’s revenue:
pd_bookings = pd.merge(pd_reviews, pd_listings, on='listing_id') pd_bookings['estimated_revenue'] = pd_bookings['price'] * pd_bookings['minimum_nights']
Sum up the revenue of every booking for each listing as estimated revenue per listing:
pd_listings_revenue = pd_bookings[['listing_id','estimated_revenue']].groupby(['listing_id']).sum()
And merged the estimated revenue into the existing DataFrame (listing):
pd_listings = pd.merge(pd_listings, pd_listings_revenue, on='listing_id', how='left') pd_listings.at[pd_listings['estimated_revenue'].isnull(), 'estimated_revenue'] = 0
And we have our DataFrame ready for some analysis. Each row represents one listing, its attributes, and its estimated revenue:
This table shows the average revenue of listings in each neighbourhood:
Airbnb properties in Downtown, Capitol Hill and Beacon Hill can fetch the highest revenue. It’s shopping and CBD district.
Downtown, Capitol Hill and Beacon Hill can fetch the highest revenue
It would be useful to know the most popular time of the year to rent in Seattle, so Airbnb hosts are able to decide when to rent and when is the time for maintenance.
July, August and September are the best periods to maximise revenue. Months before May are the best time for maintenance work. From October to December is a good time to take a break and enjoy the holidays if they want to.
July, August and September are the best periods to maximise revenue.
These are the top 5 listings with the highest estimated revenue:
Wow! Looks like our top earners are hosts have minimum nights of 1000. But it might be data anomaly because 1000 nights are kind of extreme, so let’s look at the proportion of listings with different minimum_nights.
Most hosts have minimum nights of up to a month, the host with 1000 nights, gotta filter it away.
These are the top hosts (up to 7 minimum nights) with the highest estimated revenue.
These are the top hosts (up to 4 minimum nights) with the highest estimated revenue.
From these 2 tables, longer minimum nights results in higher revenue. Let's look at the correlation between minimum nights and estimated revenue.
And the correlation between minimum nights and estimated revenue after removing the listing with 1000 minimum nights.
Host with 1000 minimum nights has caused a bais towards higher minimum nights resulted in higher revenue, with a correlation of 87% between minimum nights and revenue. But after removing that host, minimum nights and estimated revenue are not highly correlated, a correlation of 20% between minimum nights and revenue.
Minimum nights and estimated revenue are not highly correlated
As an Airbnb host, it will be good to know if my property is oversaturated in the market. Find out the ratio between the number of listings (supply) to the number of bookings (demand) of different bedroom configurations:
Listings with less than 2 bedrooms are well sought after.
But wait! Properties with no bedrooms, what are kind of properties are these?
And the number of beds in these properties?
All of these properties which no bedrooms are renting the entire apartment, and they do provide at least one bed. Phew~
As an Airbnb host, I would also like to know the common group size of Seattle visitors. So as to find out if my property configuration is oversaturated in the market.
A place which accommodates 14 ranked first (highest supply/demand ratio), but the number of bookings is low (only 83 bookings) as compared to places for 2 or 3 people.
Renting a place for 2 or 3 people will give the host pretty good regular rentals.
So let us focus on renting properties for 2 to 3 people since more than half travel in a group of this size. Do these guests prefer 1 bedroom or 2 separate bedrooms?
Airbnb bedroom configurations for 2 people:
Airbnb bedroom configurations for 3people:
The majority prefers 1 bedroom.
The majority prefers 1 bedroom, less than 1% prefers 2 bedrooms. So for groups of 2s or 3s, they prefer 1 bedroom. But this could be due to the current supply of 2 bedroom properties are low.
Having good ratings is important for Airbnb hosts. Let us compare how different factors affect overall ratings:
Good communication affects the overall rating and check-in rating
Communication has the highest correlation with the overall rating. Host in Seattle (maybe elsewhere too) needs to be responsive and friendly because good communication tends to get a high overall rating. Good communication also directly impacts the check-in rating.
Does having a good overall rating means the listing will bring in good wealth?
Having a good overall rating has a very small positive correlation with estimated revenue. And having a good rating has almost no impact on the price set by the host.
But still, having a good overall rating is highly recommended.
These are the number of Airbnbs in Seattle that provides these amenities:
Internet, heating and kitchen are necessities in Seattle.
Smoke detector? I just learnt that the Washington State Building Code has required smoke detectors in all dwellings since 1973.
So, here is the summary of this article:
Check out the codes used in this article!