Machine Learning Price Prediction with Linear Regression | by Gabriel Thomsen | Jun, 2024

The Boston Housing dataset is a traditional dataset used within the discipline of machine studying and statistics. It comprises varied options about homes in Boston, such because the variety of rooms, property tax charge, and proximity to the Charles River. The aim of this undertaking is to construct a linear regression mannequin to foretell the median worth of owner-occupied houses (MEDV) primarily based on these options. By doing so, I intention to grasp the relationships between various factors and home costs, and to guage the mannequin’s efficiency in making correct predictions.

#Load libraries
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns#Load knowledge
df= pd.read_csv("HousingData.csv")
#Print first 5 rows
df.head()
#Print primary statistics
df.describe()

iframe title=”Embedded cell output” src=”https://embed.deepnote.com/9aca5afb-eef9-4ab6-9ae7-248a1c6e44fb/aee968b29a064635849451d91aa974e9/66015182de8a489b8e252afc8a28c799?top=341″ top=”341″ width=”500″/

For reference, that is what every column means

Trying on the the correlation matrix above, we will determine some variables which might be correlated with median worth. For this evaluation we are going to stick with variables with a correlation of absolute 0.4 or above, that are INDUS, NOX, RM, TAX, PTRATIO and LSTAT.

The INDUS (proportion of commercial use land) and LSTAT (proportion of decrease standing inhabitants) comprise some null values, which aren’t supported in linear regression. Since none of them account for greater than 4% of the information, we are going to choose to drop them, and we are going to verify for excessive outliers.

RM, LSTAT and MEDV comprise some outlier values, so we are going to first prepare the mannequin together with the outliers, after which strive once more with out them

The basis imply sq. error is 4, which is round 18% of the median home worth of twenty-two (each in 1000’s USD). At face worth, it is a passable quantity, however wanting on the plot, there’s a constant development to foretell decrease values than the precise. This can be because of the outliers we included, so we are going to now prepare and consider a brand new mannequin with out the outliers

We received a negligible enchancment within the RMSE (Root Imply Squared Error), however wanting on the scatter plot, it could be that the bias to foretell decrease costs could also be mitigated. To check this, we’ll calculate the bias for each fashions and evaluate

The development in imply error is negligible, however the bias has been considerably diminished, from 1.09 to 0.46, that means that this mannequin has much less of a scientific bias and is extra dependable for prediction, because the predictions are much less systematically skewed.

By this undertaking, I used to be in a position to apply linear regression methods to foretell home costs utilizing the Boston Housing dataset. By rigorously deciding on related options, dealing with outliers, and evaluating the mannequin’s efficiency, I gained worthwhile insights into the elements that affect home costs.

The preliminary mannequin, which included outliers, had a Root Imply Squared Error (RMSE) of 4.078. After eradicating outliers, the RMSE improved barely to three.963. Moreover, the Imply Absolute Error (MAE) and bias (imply error) confirmed enhancements, indicating a extra balanced and correct mannequin.

Whereas the enhancements have been marginal, this train highlighted the significance of knowledge preprocessing and the impression of outliers on mannequin efficiency. It additionally bolstered the necessity for steady analysis and refinement of fashions to attain higher accuracy.

Source link

Machine Learning Price Prediction with Linear Regression | by Gabriel Thomsen | Jun, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

LogicMonitor Seeks to Disrupt AI Landscape with an $800 Million Strategic Investment at a Valuation of Approximately $2.4 Billion to Revolutionize Data Centers

Denodo Platform 9.1 Brings New Advanced AI Capabilities and Enhanced Data Lakehouse Performance

Harnessing AI in Agriculture – insideAI News

How Big Data Is Transforming Patient Care Delivery

How to Assist Human Agents & Transform Customer Experience with Conversational AI?

Our Picks

-Problema del área y la tangente del cálculo- | by Junior | May, 2024

K-Nearest Neighbours (KNN) for Classification | by Sarvesh Khetan | May, 2024

Enhancing Customer Engagement and Satisfaction through Social Media Analysis and Automation | by Shaleenkacker | May, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Machine Learning Price Prediction with Linear Regression | by Gabriel Thomsen | Jun, 2024

Related Posts