Working with missing data in SciKit Learn | Data Preprocessing and Modeling in Machine Learning | by Rohit Patil | May, 2024

Within the discipline of machine studying, one of many essential steps in constructing a predictive mannequin is preprocessing the information. This course of includes dealing with lacking values, encoding categorical variables, and splitting the information into coaching and testing units. On this article, we are going to discover preprocess a dataset containing details about automobile gross sales utilizing Python’s Pandas and Scikit-learn libraries.

We begin by importing the required libraries and loading the dataset right into a Pandas DataFrame.

import pandas as pd
# Import the dataset
file = pd.read_csv("../datasets/Car_sales_missing.csv")
file = file.drop("Latest_Launch", axis=1)
file

Subsequent, we examine for lacking values within the dataset and fill them utilizing acceptable methods. We use the imply worth for numerical columns and a placeholder for categorical columns.

# Test for lacking values
file.isna().sum()# Fill lacking values
file["Vehicle_type"].fillna("lacking", inplace=True)
file["__year_resale_value"].fillna(file["__year_resale_value"].imply(), inplace=True)
file["Sales_in_thousands"].fillna(file["Sales_in_thousands"].imply(), inplace=True)
file["Price_in_thousands"].fillna(file["Price_in_thousands"].imply(), inplace=True)
file["Engine_size"].fillna(file["Engine_size"].imply(), inplace=True)
file["Horsepower"].fillna(file["Horsepower"].imply(), inplace=True)
file["Wheelbase"].fillna(file["Wheelbase"].imply(), inplace=True)
file["Width"].fillna(file["Width"].imply(), inplace=True)
file["Length"].fillna(file["Length"].imply(), inplace=True)
file["Curb_weight"].fillna(file["Curb_weight"].imply(), inplace=True)
file["Fuel_capacity"].fillna(file["Fuel_capacity"].imply(), inplace=True)
file["Fuel_efficiency"].fillna(file["Fuel_efficiency"].imply(), inplace=True)
file["Power_perf_factor"].fillna(file["Power_perf_factor"].imply(), inplace=True)
file.isna().sum()

After dealing with lacking values, we remodel the specific variables into numerical representations utilizing one-hot encoding.

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
# Outline categorical options for one-hot encoding
categorical_features = ["Manufacturer", "Vehicle_type"]
one_hot = OneHotEncoder()
transformer = ColumnTransformer([("one_hot", one_hot, categorical_features)],
the rest="passthrough")# Remodel the information
transformed_x = transformer.fit_transform(x)
transformed_x

Lastly, we cut up the remodeled information into coaching and testing units to organize for mannequin coaching.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(transformed_x, y, test_size=0.2)

We then use a random forest regressor to construct a predictive mannequin and consider its efficiency.

from sklearn.ensemble import RandomForestRegressor
mannequin = RandomForestRegressor()
mannequin.match(x_train, y_train)
mannequin.rating(x_test, y_test)

On this article, we now have mentioned the significance of knowledge preprocessing in machine studying and demonstrated deal with lacking values, encode categorical variables, and cut up the information for coaching and testing. These preprocessing steps are important for constructing correct and dependable machine studying fashions.

Source link

Working with missing data in SciKit Learn | Data Preprocessing and Modeling in Machine Learning | by Rohit Patil | May, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

AI Has Run Into Data Shortage and Overtraining Problems

A Comprehensive Guide on Financial Crime Compliance Standards in 2024

6 Ways Generative AI has Streamlined Customer Experience

Mind Uploading: The Ethics of Our Digital Afterlife

How to Craft an AI Plan for Customer Service

Our Picks

Major Investments in the Field of AI in India | by Ankit Pandey | Jul, 2024

Grad-CAM In PyTorch: A Powerful Tool For Visualize Explanations From Deep Networks | by CodeTrade India | Jun, 2024

Research on Optical Flow Estimation part9(Machine Learning 2024) – Monodeep Mukherjee

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Working with missing data in SciKit Learn | Data Preprocessing and Modeling in Machine Learning | by Rohit Patil | May, 2024

Related Posts