Implementing Linear Regression and Decision Tree Classifier Using Scikit-Learn | by Saqib Hussain | Jul, 2024

On this article, we’ll discover implement two elementary machine studying algorithms: Linear Regression and Determination Tree Classifier. We’ll use the Boston Housing dataset to foretell housing costs and the Iris dataset to categorise iris flower species. Moreover, we’ll cowl primary workouts that will help you get began with knowledge evaluation and machine studying utilizing Python and Scikit-Be taught.

Load and Discover the Boston Housing Dataset

The Boston Housing dataset incorporates details about numerous options of homes in Boston and their corresponding costs. To begin, we load the dataset and study the primary few rows to grasp the info.

from sklearn.datasets import load_boston
import pandas as pdboston = load_boston()
boston_df = pd.DataFrame(knowledge=boston.knowledge, columns=boston.feature_names)
boston_df['MEDV'] = boston.goal
print(boston_df.head())

Prepare a Linear Regression Mannequin

After loading and exploring the dataset, we break up the info into coaching and testing units. We then practice a linear regression mannequin on the coaching knowledge. The mannequin’s coefficients and intercept are obtained, which point out the connection between the options and the goal variable (housing costs).

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_errorX = boston.knowledge
y = boston.goal
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
lr_model = LinearRegression()
lr_model.match(X_train, y_train)
print(f"Coefficients: {lr_model.coef_}")
print(f"Intercept: {lr_model.intercept_}")

Predict and Consider

We use the educated mannequin to foretell housing costs on the check set. The mannequin’s efficiency is evaluated utilizing the imply squared error, a typical metric for regression duties.

y_pred = lr_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Imply Squared Error: {mse}")

Visualize the Outcomes

To visualise the mannequin’s efficiency, we plot the precise vs predicted housing costs. This helps us perceive how nicely the mannequin is performing and determine any patterns or discrepancies.

import matplotlib.pyplot as pltplt.scatter(y_test, y_pred, edgecolor='okay')
plt.xlabel('Precise Costs')
plt.ylabel('Predicted Costs')
plt.title('Precise vs Predicted Costs')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], shade='purple', linewidth=2)
plt.present()

Load and Discover the Iris Dataset

The Iris dataset is a basic dataset in machine studying, containing details about completely different species of iris flowers. We load the dataset and break up it into coaching and testing units to organize for mannequin coaching.

from sklearn.datasets import load_irisiris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.knowledge, iris.goal, test_size=0.2, random_state=42)

Prepare a Determination Tree Classifier

We practice a choice tree classifier on the coaching knowledge. After coaching, we consider the mannequin’s efficiency utilizing a classification report and confusion matrix. These metrics present insights into the mannequin’s accuracy and talent to appropriately classify every iris species.

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrixdt_model = DecisionTreeClassifier()
dt_model.match(X_train, y_train)
y_pred = dt_model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Visualize the Determination Tree

Visualizing the choice tree helps us perceive how the mannequin makes selections. The visualization reveals the options and thresholds used at every node of the tree, offering a transparent image of the mannequin’s decision-making course of.

from sklearn.tree import plot_tree
import matplotlib.pyplot as pltplt.determine(figsize=(20,10))
plot_tree(dt_model, feature_names=iris.feature_names, class_names=iris.target_names, crammed=True)
plt.present()

Load and Discover a Dataset

Loading and exploring a dataset is step one in any knowledge evaluation activity. For instance, loading the Iris dataset and printing the primary few rows helps us perceive the construction and contents of the info.

import pandas as pdiris_df = pd.DataFrame(knowledge=iris.knowledge, columns=iris.feature_names)
print(iris_df.head())

Fundamental Statistics and Visualization

Exploring primary statistics of a dataset, reminiscent of imply, median, and commonplace deviation, gives useful insights into the info’s distribution and central tendencies. Visualizing the distribution of options utilizing histograms additional aids in understanding the info.

print(iris_df.describe())plt.hist(iris_df.iloc[:, 0], bins=20, edgecolor='okay')
plt.xlabel(iris.feature_names[0])
plt.ylabel('Frequency')
plt.title('Distribution of ' + iris.feature_names[0])
plt.present()

Generate and Analyze Random Numbers

Producing a matrix of random numbers and calculating primary statistics for a listing of numbers are elementary workouts that assist in understanding knowledge manipulation and statistical evaluation.

import numpy as npmatrix = np.random.rand(5, 5)
print(matrix)
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
stats = {
'rely': len(numbers),
'imply': np.imply(numbers),
'median': np.median(numbers),
'std_dev': np.std(numbers)
}
print(stats)

On this article, we explored the implementation of linear regression and resolution tree classifier utilizing the Boston Housing and Iris datasets, respectively. We additionally lined primary knowledge evaluation duties and workouts in Python. These examples present a stable basis for additional exploration and studying in machine studying.

This text was written by Saqib Hussain, a passionate learner and aspiring machine studying engineer, at the moment enrolled within the Bytewise Fellowship Program.

#100Daysofbytewisefellowship

Source link

Implementing Linear Regression and Decision Tree Classifier Using Scikit-Learn | by Saqib Hussain | Jul, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Salesforce Introduces Agentforce Testing Center: AI Agent Lifecycle Management Tooling for Testing Autonomous AI Agents at Scale

70% of Firms Disrupted by AI: New Endava Research

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Our Picks

Demystifying Principal Component Analysis for Machine Learning enthusiasts | by Preeti Arora | Jul, 2024

Expectation-Maximization (EM) clustering | by Sanket Nadargi | May, 2024

Research on Locality-Sensitive Hashing part2(Machine Learning Future) | by Monodeep Mukherjee | May, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Implementing Linear Regression and Decision Tree Classifier Using Scikit-Learn | by Saqib Hussain | Jul, 2024

Load and Discover the Boston Housing Dataset

Prepare a Linear Regression Mannequin

Predict and Consider

Visualize the Outcomes

Load and Discover the Iris Dataset

Prepare a Determination Tree Classifier

Visualize the Determination Tree

Load and Discover a Dataset

Fundamental Statistics and Visualization

Generate and Analyze Random Numbers

Related Posts