Machine Learning: Prediction of House Prices in Malaysia | by Faiz Yah | May, 2024

Be taught data and drop columns

df = pd.read_csv('house_cleaned.csv')
df = df.drop(columns=['Unnamed: 0','Building Name'])

2. Define choices (X) and objective (y)

X = df.drop(columns = 'Price_in_RM')
y = df['Price_in_RM']

X represents the choices (unbiased variables) which can be utilized for the predictions.

y represents the objective (dependent variable) that the ML model objectives to predict, on this case is the Worth of the Properties in RM (foreign exchange of Malaysia).

3. Break up the data into testing models and training models

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=45)

We then break up the data set into the ratio of 80/20.

80% of the data are used for teaching the model, whereas the remaining 20% are reserved for testing the effectivity of the model.

4. Preprocessing of categorical and numerical columns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScalercategorical_columns = ['Property Type', 'Land Title', 'Tenure Type']
numerical_columns = ['Property Size_in_sq_ft', 'Bedroom', 'Bathroom', 
'Amount of Facilities', 'Parking Lot']
categorical_transformer = OneHotEncoder()
numerical_transformer = StandardScaler()
preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_columns),
('num',numerical_transformer, numerical_columns)
]
)

This preprocessing course of objectives to remodel the data into codecs which might be most suitable for analysis by the ML model.

For the categorical columns, we transfer by the use of ‘OneHotEncoder’ to rework categorical values into numerical format.

For numerical columns, we apply StandardScaler to standardize the numerical values into a daily format.

Subsequent, the ColumnTransformer combines these two transformers proper right into a single preprocessor.

5. Define Machine Learning model (Linear Regression)

from sklearn.linear_model import LinearRegressionmodel = LinearRegression()

As a beginner, linear regression is the appropriate various of ML model for lots of causes: simplicity, interpretability, and effectivity. Although comparatively straightforward, it is nonetheless equally vital.

6. Create pipeline

my_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('model', model)])

Pipeline merges the preprocessing and ML modelling into one, which suggests that we’ll simplify the whole course of proper right into a single step. (With out pipeline, we should carry out OneHotEncoder, StandardScalar and model teaching into explicit particular person steps, which may be refined.)

7. Match the model with the teaching models

my_pipeline.match(X_train, y_train)

As this generally is a supervised learning ML model, we would then match the model with the Choices (X) and Purpose (y) of the teaching models.

8. Set up the variations in prices (Predicted vs Exact)

predicted_price = my_pipeline.predict(X_test)
actual_price = y_testprice_comparison = pd.DataFrame({'Predicted Worth':predicted_price, 
'Exact Worth':actual_price})
price_comparison

The teaching session is over, now it is time for testing.

Giving the model the Choices (X_test) of teaching models, it’d then predict the worth of the properties based on what it had realized all through the teaching session.

We are going to now consider the Predicted Worth with the Exact Worth (y_test). Some predictions had been off by tens of 1000’s, whereas others few an entire lot of 1000’s.

9. Contemplate the model

# Indicate Absolute Error
mae = mean_absolute_error(actual_price, predicted_value)# R2 Ranking
r2 = r2_score(actual_price, predicted_value)
print('Indicate Absolute Error: ', spherical(mae, 2))
print('R2 Ranking: ', spherical(r2,5))

Indicate Absolute Error: 126926.09
R2 Ranking: 0.50338

It could possibly be troublesome to gauge the effectivity of a model solely by making an attempt on the excellence in prices, that is the place statistical evaluation metrics is offered in!

Indicate Absolute Error (MAE) is extraordinarily intuitive and easy to know, it signifies the frequent variations between the anticipated worth and exact worth. On this case, it signifies that our predictions had been off by RM126,926.09 on frequent.

R² Ranking provides an complete analysis of how properly the model approximates the exact data. The R²score of 0.50338 signifies that the model explains solely about 50.34% of the variance inside the exact prices.

Source link

Machine Learning: Prediction of House Prices in Malaysia | by Faiz Yah | May, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

LogicMonitor Seeks to Disrupt AI Landscape with an $800 Million Strategic Investment at a Valuation of Approximately $2.4 Billion to Revolutionize Data Centers

Denodo Platform 9.1 Brings New Advanced AI Capabilities and Enhanced Data Lakehouse Performance

Harnessing AI in Agriculture – insideAI News

How Big Data Is Transforming Patient Care Delivery

How to Assist Human Agents & Transform Customer Experience with Conversational AI?

Our Picks

Down on the Digital Farm: How AI is Plowing the Fields of Agricultural Innovation

The Important Role of Data Center Switches in Network Automation

Riding the Rune Wave: Bitcoin’s DeFi Frontier Expands with a $2M Boost for Runes DEX | by PromptPro AI | Apr, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Machine Learning: Prediction of House Prices in Malaysia | by Faiz Yah | May, 2024

Related Posts