Machine Learning: Prediction of House Prices in Malaysia | by Faiz Yah | May, 2024

Learn knowledge and drop columns

df = pd.read_csv('house_cleaned.csv')
df = df.drop(columns=['Unnamed: 0','Building Name'])

2. Outline options (X) and goal (y)

X = df.drop(columns = 'Price_in_RM')
y = df['Price_in_RM']

X represents the options (unbiased variables) which can be used for the predictions.

y represents the goal (dependent variable) that the ML mannequin goals to foretell, on this case is the Value of the Homes in RM (forex of Malaysia).

3. Break up the information into testing units and coaching units

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=45)

We then break up the information set into the ratio of 80/20.

80% of the information are used for coaching the mannequin, whereas the remaining 20% are reserved for testing the efficiency of the mannequin.

4. Preprocessing of categorical and numerical columns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScalercategorical_columns = ['Property Type', 'Land Title', 'Tenure Type']
numerical_columns = ['Property Size_in_sq_ft', 'Bedroom', 'Bathroom', 
'Amount of Facilities', 'Parking Lot']
categorical_transformer = OneHotEncoder()
numerical_transformer = StandardScaler()
preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_columns),
('num',numerical_transformer, numerical_columns)
]
)

This preprocessing course of goals to rework the information into codecs which can be most fitted for evaluation by the ML mannequin.

For the categorical columns, we move by way of ‘OneHotEncoder’ to transform categorical values into numerical format.

For numerical columns, we apply StandardScaler to standardize the numerical values into a regular format.

Subsequent, the ColumnTransformer combines these two transformers right into a single preprocessor.

5. Outline Machine Studying mannequin (Linear Regression)

from sklearn.linear_model import LinearRegressionmannequin = LinearRegression()

As a newbie, linear regression is the right alternative of ML mannequin for a lot of causes: simplicity, interpretability, and effectivity. Though comparatively easy, it’s nonetheless equally important.

6. Create pipeline

my_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('model', model)])

Pipeline merges the preprocessing and ML modelling into one, which means that we will simplify the entire course of right into a single step. (With out pipeline, we must perform OneHotEncoder, StandardScalar and mannequin coaching into particular person steps, which might be sophisticated.)

7. Match the mannequin with the coaching units

my_pipeline.match(X_train, y_train)

As this can be a supervised studying ML mannequin, we’d then match the mannequin with the Options (X) and Goal (y) of the coaching units.

8. Establish the variations in costs (Predicted vs Precise)

predicted_price = my_pipeline.predict(X_test)
actual_price = y_testprice_comparison = pd.DataFrame({'Predicted Value':predicted_price, 
'Precise Value':actual_price})
price_comparison

The coaching session is over, now it’s time for testing.

Giving the mannequin the Options (X_test) of coaching units, it might then predict the value of the homes primarily based on what it had realized throughout the coaching session.

We will now evaluate the Predicted Value with the Precise Value (y_test). Some predictions had been off by tens of 1000’s, whereas others few a whole lot of 1000’s.

9. Consider the mannequin

# Imply Absolute Error
mae = mean_absolute_error(actual_price, predicted_value)# R2 Rating
r2 = r2_score(actual_price, predicted_value)
print('Imply Absolute Error: ', spherical(mae, 2))
print('R2 Rating: ', spherical(r2,5))

Imply Absolute Error: 126926.09
R2 Rating: 0.50338

It could be difficult to gauge the efficiency of a mannequin solely by trying on the distinction in costs, that’s the place statistical analysis metrics is available in!

Imply Absolute Error (MAE) is extremely intuitive and straightforward to know, it signifies the common variations between the expected value and precise value. On this case, it signifies that our predictions had been off by RM126,926.09 on common.

R² Rating offers an total evaluation of how nicely the mannequin approximates the precise knowledge. The R²score of 0.50338 signifies that the mannequin explains solely about 50.34% of the variance within the precise costs.

Source link

Machine Learning: Prediction of House Prices in Malaysia | by Faiz Yah | May, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

A Technical Report on the Kaggle Titanic Dataset | by Bernard Worthy | Jun, 2024

Working with Dynkin games part12(Machine Learning 2024) – Monodeep Mukherjee

Advancing AI innovation with cutting-edge solutions

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Machine Learning: Prediction of House Prices in Malaysia | by Faiz Yah | May, 2024

Related Posts