e1a9bc-RESIZED-PLA-HDB-HOUSING-JURONG-EAST-2018-SIC.jpg

Objectives

  1. To identify the factors that influence HDB resale prices in Singapore.
  2. To create a Data Science project that provide insights on the trend of HDB resale prices
  3. To create an end to end Machine Learningproject, and provide predictive analytics on HDB resale prices

About the dataset:

HDB resale price data downloaded from Data.gov.sg, consisting of over 800k resale transactions from 1990 to 2020.

In this notebook:

1.Loading of datasets

2.Data cleaning and Preprocessing

3.Visual Exploratory Data Analysis
3.1 Proportion of flat type ownership over the past 30 years (1990 - 2020)
3.2 Proportion of flat type ownership over the past 5 years (2015 - 2020)
3.3 Median resale price of a 4 room flat vs 5 room flat.

4.Data preparation and feature engineering/selection

5.Model selection and training.
5.1 Checking if Linear Regression yields good results.
5.2 Feature Importance
5.3 Using Random Forest with GridSearchCV.
5.4 Random forest with hyperparameter optimization.

6.Conclusion and things to work/improve on

1. Loading of datasets

2. Data cleaning and Preprocessing

Data in field flat_type has double data (Multi Generation vs Multi-Generation). There is a need to rectify this as the refer to the same thing rather than unique data

Types of HDB flat model in Singapore

Standard Introduced in 1960s. Can be 1/2/3/4/5-room.Have WC and shower in same room

Improved Introduced in 1966. The 3/4-room having separate WC and shower, they also featured void decks. 5-room Improved were introduced in 1974.

New Generation Introduced in 1975, New Generation flats can be 3-Room (67 / 82 sqm) or 4-Room (92 sqm), featuring toilet for master bedroom, with pedestal type Water Closet, plus store room.

Model A Introduced in 1981: 3-Room (75 sqm), 4-Room (105 sqm), 5-Room (135 sqm), 5-Room Maisonette (139 sqm)

Model A2 Smaller units of Model A. e.g., 4-Room Model A2 (90 sqm)

Multi Generation 3Gen flats designed to meet the needs of multi-generation families.

Maisonette Model A Maisonette — 2 storeys HDB flat

Executive Maisonette A premium version of Model A Maisonettes. These units are no longer being built after being replaced by the Executive Condominium (EC) scheme in 1995

Executive Apartment Introduced in 1983 and replaced 5-Room Model A flats, in addition of the 3-bedroom and separate living/dining found in 5A flats, EA and EM feature an utility/maid room. 80% of Executive units were Maisonettes and 20% were Apartments.

Premium Apartment Introduced in the 1990s, featuring better quality finishes, in ready-to-move condition, with flooring, kitchen cabinets, built-in wardrobes upon purchase

DBSS Also known as the Design Build and Sell Scheme.They are a unique (and premium) breed of HDB flats in Singapore, which are built by private developers. High Prices. Quite similiar to Executive Condominium except DBBS is like a premium HDB without facilities of private condos and remains a HDB flat while EC can be converted to a private property over time.

Adjoined Flat Large HDB flats which are combined from 2 HDB flats

Terrace HDB terrace flats built before HDB Due to lSingapore's land constraint, this is no longer being built and offered for sale

Type S1S2 Apartments like The Pinnacle@Duxton are classified as "S" or Special apartments in view of its historical significance and award-winning design. For application of HDB policies, S1 and S2 apartments will be treated as 4-room and 5-room flats respective

2-room This refers to 2-room flexi where there is 1 bedroom and 1 common area. It can also fall under a 99 year lease scheme. This flats are meant for elderly or those with smaller family sizes.

3. Visual Exploratory Data Analysis

Using the built-in hist() function from pandas to plot a histogram for resale_price, remaining_lease, and lease_commence_date. This helps us better understand the distribution of values for these numerical variables.

The new_remaining_lease variable has a distribution close to normal, while the resale_price variable is left-skewed.

3.1 Proportion of flat type ownership over the past 30 years (1990 - 2020)

3.2 Proportion of flat type ownership over the past 5 years (2015 - 2020)

3.3 Median resale price of a 4 room flat over the years vs 5 room flat.

From the graph, from 2010, we can see that median resale prices (for 4 room flats) between central area and other mature estates started to diverge from other towns like Woodlands, Yishun and Hougang

For 5 room resale flats the median resale prices saw a spike especially for central area. Towns like Woodlands,Sengkang,Hougang and Punggol saw relatively gentle increase over the years.

4. Data preparation and feature engineering/selection

encoding flat_type,storey_range

5. Model selection and training

5.1 Checking if Linear Regression yields good results.

Using Decision Tree

Using decision tree lowers the error to $91,435. The prediction slightly better than inear regression. Next we will try random forest with optimization to try to get the best model and parameters

5.Random Forest and hyperparameter optimization.

5.2 Feature Importance

5.3 Using Random Forest with GridSearchCV.

5.4 Random forest with hyperparameter optimization

6. Conclusion and things to improve/work on.

  1. Include a location field which plays a part in resale prices of HDB flats. This can be done by feature engineering and perhaps finding the longitude and latitude using OneMap.sg. Also amenities may play a significant factor in affecting resale prices and the data should be scraped from OneMap.sg API if available.

  2. The dataset is 30 years worth a data. Perhaps the resale prices should be normalized to account for inflation. This can be further calculated using the Consumer Price Index.

  3. PyCaret package can be used to compare the differences between a greater number of models such as ADA Boosting or XG Boosting within a shorter span of time.

  4. Removing lease commence date from the linear regression model that was run. In the model, lease commence date seems to be negatively correlated to resale price which is a little counter intutitive. This feature is already represented by new remaining lease feature which was calculated.

  5. Deployment of a web app on cloud. The web app can be deployed using Heroku , AWS or Azure to implement in production.