Skip to main content

Boston Data Analysis

·6 mins

Travelling Image

Business Understanding #

I love to travel and usually when I have time and money, I pack my bags and hit the road. One of the issues affecting my travel budget is the cost of accommodation (hotel…). Recently, when I joined the program on Udacity, I got data about housing prices in Boston. Let’s study housing prices in Boston with me. There are 3 questions that I am interested in when looking at this data:

  • Question 1: What are prices in cities in a specified timeframe?
  • Question 2: What are the factors that impact prices?
  • Question 3: How much the price for given house ?

Prerequisites #

Housing price data is downloaded from Kaggle at the following link Boston data on Kaggle

Data Understanding #

In the Boston data, there are some files:

  • calendar.csv: Calendar, including listing id and the price and availability for that day
  • listing.csv: Listings, including full descriptions and average review score
  • reviews.csv: Reviews, including a unique id for each reviewer and detailed comments

We take alook about Boston data.

Gather data #

We can display the data by looking at data frames the calendar, listing, and reviews

Calendar sample data

Indexlisting_iddateavailableprice
0121479739/5/2017fNaN
1121479739/4/2017fNaN
2121479739/3/2017fNaN
3121479739/2/2017fNaN
4121479739/1/2017fNaN

The prices will be NaN in the cases there is no available (value got f)

Listing sample data

Indexidlisting_urlscrape_idlast_scrapednamesummaryspacedescriptionexperiences_offeredneighborhood_overviewreview_scores_valuerequires_licenselicensejurisdiction_namesinstant_bookablecancellation_policyrequire_guest_profile_picturerequire_guest_phone_verificationcalculated_host_listings_countreviews_per_month
012147973https://www.airbnb.com/rooms/121479732.01609E+139/7/2016Sunny Bungalow in the CityCozy, sunny, family home. Master bedroom high…The house has an open and cozy feel at the sam…Cozy, sunny, family home. Master bedroom high…noneRoslindale is quiet, convenient and friendly. …NaNfNaNNaNfmoderateff1NaN
13075044https://www.airbnb.com/rooms/30750442.01609E+139/7/2016Charming room in pet friendly aptCharming and quiet room in a second floor 1910…Small but cozy and quite room with a full size…Charming and quiet room in a second floor 1910…noneThe room is in Roslindale, a diverse and prima…9fNaNNaNtmoderateff11.3
26976https://www.airbnb.com/rooms/69762.01609E+139/7/2016Mexican Folk Art Haven in BostonCome stay with a friendly, middle-aged guy in …Come stay with a friendly, middle-aged guy in …Come stay with a friendly, middle-aged guy in …noneThe LOCATION: Roslindale is a safe and diverse…10fNaNNaNfmoderatetf10.47
31436513https://www.airbnb.com/rooms/14365132.01609E+139/7/2016Spacious Sunny Bedroom Suite in Historic HomeCome experience the comforts of home away from…Most places you find in Boston are small howev…Come experience the comforts of home away from…noneRoslindale is a lovely little neighborhood loc…10fNaNNaNfmoderateff11
47651065https://www.airbnb.com/rooms/76510652.01609E+139/7/2016Come Home to BostonMy comfy, clean and relaxing home is one block…Clean, attractive, private room, one block fro…My comfy, clean and relaxing home is one block…noneI love the proximity to downtown, the neighbor…10fNaNNaNfflexibleff12.25

In the data, we can see there are mixed values of text, numeric and boolean values. The listing data include information about all listings such as how many bedrooms, bedroom types and prices in normal cases Reviews sample data

Indexlisting_ididdatereviewer_idreviewer_namecomments
0117816247241405/21/20134298113OlivierMy stay at islam’s place was really cool! Good…
1117816248691895/29/20136452964CharlotteGreat location for both airport and city - gre…
2117816250031966/6/20136449554SebastianWe really enjoyed our stay at Islams house. Fr…
3117816251503516/15/20132215611MarineThe room was nice and clean and so were the co…
4117816251711406/16/20136848427AndrewGreat location. Just 5 mins walk from the Airp…

Evaluation #

After doing data manipulation, I use it to answer 3 questions

Question 1: What are prices in cities in a specified timeframe? #

I will do the calculation of the group by city and month (new added field) and calculate the mean value. Since the listing prices are taken on a daily basis, for the convenience of monthly calculations, we will calculate the mean price by month.

Visualize the data #

df_listing

  • In the chart, we can see Charlestown city has the highest price (with a mean price 340$/day) and West Roxbury has the lowest price every month.
  • Some cities are having a stable price in all months: Watertown, Milton
  • Except for Boston (common location), some cities have a big difference in prices in day and month. Min price at Oct and Max price at Apr
    • Dorchester has a low price at 22$/day

We can see that:

  • In the chart, we can see Charlestown city has the highest price (with a mean price 340$/day) and West Roxbury has the lowest price every month.
  • Some cities are having a stable price in all months: Watertown, Milton
  • Except for Boston (common location), some cities have a big difference in prices in day and month. Min price at Oct and Max price at Apr
    • Dorchester has a low price at 22$/day

It’s June now, if you want to go to Boston then I think you can consider West Roxbury as the place with the cheapest housing price

Question 2: What are most factor impact to price ? #

In the listing, we want to investigate how properties impact price. We will visualize by their coefficients.

Visualize coefficients #

We can see that some features mostly impact to price: weekly_price, monthly_price , accommodates, bedrooms, square_feet. There are features that lowest impact to price: acceptance_rate, reviews_per_month

headmap

If we want to rent a house having large accommodates, bedrooms, square feets then the prices will be high. To save the housing price we can choose small accommodates, bedrooms, square feets

Question 3: Predict price #

As in question 1, West Roxbury is an attractive place for me. However, here, there are no houses that have accommodations = 10. I am wondering if there are any houses with such accommodations, what will be the price?

Now let’s try to get a record from West Roxbury city and add the accommodates information (value will be 10) and run it through the trained model.

As you can see, the result will be 55.3$. I think this is a good price for a trip here !

I hope you had a great time reading the article !

For the source code, you can refer on Github