Visualization for Airbnb Paris Data
Visualization for Airbnb Paris Data
Introduction
Nowadays, Airbnb is a trendy alternative to traditional lodging. It is growing at a fast pace and has become a good choice for travelers who appreciate the local lifestyle. Everyone with an open room or free space can become a host. Airbnb offers hosts an opportunity to make side income and can offer budget travelers some lowest-cost accommodation options. The purpose of the project is to use data analytics and visualization skills to help Airbnb hosts wisely price their properties. Machine learning model was developed to identify the top listing price influencers with data from Airbnb Paris. Insights like key price influencers were gained. The top price influencers were then examined respectively. Visualizations were created to help hosts price listings strategically. In turn, this helps Airbnb to boost reservations and grow as a business.
New hosts need to make a ton of decisions when they enlist their properties on Airbnb. In addition to the “objective information” like property’s physical features, location; hosts need to provide tons of “subjective information” like cancellation policies, listing for long-term or short-term rental, or if the property is to be rented out in its entirety, or as individual bedrooms, etc. To help them optimize the value out of their properties, we analyzed Airbnb’s listing and calendar data. Key price influencers are identified from machine learning price-prediction model. Each top price influencers were studied with a series of visualizations to help hosts make better decisions.
Data Source
The data for this project comes from Airbnb’s website. Based on our motivations, we need information on current Airbnb listings with details and data on listing prices. We found the date we need from the listings table which contains detailed listing information, and the calendar table showing listing price for a whole year. The latest listing data complied on Feb. 5, 2019 by Airbnb was used to get listing features. Historical listing price data from Dec 8, 2017 to Dec 6, 2018 was found in Airbnb’s archive and used to train and predict pricing. Both datasets were cleaned for irrelevant data, null values and outliers. Afterwards, the calendar and listing tables were joined as a dataset with shape of 3,378,389 observations and 16 columns.
Design Process
Various key perceptual properties of color used is applied to the visualizations. Pastel colors instead of saturated colors were chosen for all the figures in accordance with key perceptual properties for visual benefits. Red label in Figure 4 was used in the center of the field of view since the edges of the retina is not sensitive to red and green. Only the center of the region was highlighted following the visual perceptual properties of modest color usage. Clear labeling is used for all the figures following Tufte’s integrity principles to defeat ambiguity. Tufte’s design
principles are also followed by maximized data-ink ratio and minimized chart junk. Data density is maximized especially in Figure 5 where each box plot shows the range of prices for the room type and rental period.
Data Analysis & Insights
Machine Learning model with random forest regressor was used to train and predict the price. Around 30 related features were fed into the models. Then feature importance matrix was printed. Visualization on top price influencers were created as below to show a ballpark idea for potential Airbnb hosts on the top listing price determination factors.
Figure 1. Top Daily Listing Determination Feature Importance
Cancellation Policy
From this figure, we learned that cancelation policy is a deterministic factor for pricing. Hence, we examined the pricing relating to cancelation in more detail.
We explored data size for different cancellation policies. As show below, the super_strict_60, super_strict_30 and strict categories in sum accounts for 0.7% of the total dataset. They were combined into strict categories.
From this visualization, hosts can see counterintuitively, with strict cancellation policy, higher booking prices were locked in. More flexible cancellation policy leads to lower prices. This data might also tell us: the more luxurious the listing property (i.e. higher the listing price), the more rigid the cancellation policy was, and Airbnb hosts need not to worry about their strict cancellation policy frighten the guests and thus reduce their price.
Neighborhood
According to Figure 1, Neighborhood was the second deterministic factor for pricing. To help potential hosts to have a ballpark understanding of how much their properties can be listed per night based on neighborhood location, Paris map visualization was created. Figure 4 highlights Paris’ 20 districts, the most important landmark, the Seine River and the respective medium daily listing price. Medium price is selected to eliminate the effect of outliers.
As highlighted with the red label, the Seine River divides Paris into the left bank and right bank. The area of Paris spans out in a spiral shape. The medium price of Airbnb in these 20 districts is shown in the orange boxes. The visualization shows the prices of listings in the center of Paris are generally higher than the areas around the edge. Location is a key price influencer.
Room Types
From Figure 1, Room type was the third deterministic feature for pricing. Figure 5 shows daily pricing based on room-types
To help hosts make decisions on renting for short or long term, pricing for respective rental periods were compared side-by-side. Properties with minum_nights more than 7 days were counted as long-term rental. The figure shows interesting fluctuation of rental prices depending on the room type and rental periods.
Bed Types
We also explored pricing by bed types. The price distribution as shown below follows our common sense with real bed costing the most.
Week-of-day
While we were observing the data, we found some of the hosts have different prices for different days. This box plot compares distribution of prices by day-of-the-week. It shows a slight creep of price on Fridays and Saturdays; while the price remained the same throughout the rest of the week. This means the day of the week is a weak factor affecting the listing price.
Occupants Accommodates
Figure 8 shows the price increases as the number of accommodates increases until occupant number peaks at 9. Then the price starting to fluctuate. For properties that can accommodate more than 9 people, there is no obvious relationship between price and accommodates. Other factors should be considered when deciding the price.
Challenges
Data preparation was a challenge. The datasets from Airbnb is not “tailored” to solve our problem. Trade-offs need to be made on data choices to focus on the most relevant data. The solution for this challenge is to refer to the initial goals and ask: what insights benefits Airbnb and the hosts? By focusing on the motivation, it becomes easier for us to take actions.
Conclusion
Through this project, we gained experience using visualization skills learned from this class to help solve a real-life problem. From loading raw data, observing data to identify business insights. We tried to think from the data scientists’ perspective for the data consumers, i.e., Airbnb potential hosts. Pricing insights were illustrated for hosts to help them make better decisions when enlisting.