Building and Deploying Machine Learning Pipelines with scikit-learn | by Noor Fatima | Jun, 2024

On this article, we’ll stroll by means of the method of constructing and deploying machine studying pipelines utilizing the Pipeline class from scikit-learn. We are going to use a dataset from the Titanic competitors as an example the method.

A machine studying pipeline in scikit-learn is a solution to streamline a collection of information processing and modeling steps. Pipelines assist be sure that the identical transformations are utilized throughout each coaching and testing, stopping information leakage and making your workflow cleaner and extra reproducible.

We are going to use the Titanic dataset, which accommodates details about passengers and whether or not they survived the Titanic catastrophe. The objective is to construct a mannequin that predicts survival primarily based on passenger attributes.

import pandas as pd# Load the dataset
df = pd.read_csv('practice.csv')
print(df.head())

We drop columns that gained’t be helpful for prediction.

df.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin'], inplace=True)

Cut up the info into coaching and testing units.

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(
df.drop(columns=['Survived']), 
df['Survived'], 
test_size=0.2, 
random_state=42
)

Imputation Transformer

Deal with lacking values.

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputertrf1 = ColumnTransformer([
('impute_age', SimpleImputer(), [2]),  # Impute Age
('impute_embarked', SimpleImputer(technique='most_frequent'), [6])  # Impute Embarked
], the rest='passthrough')

One-Scorching Encoding

Convert categorical variables into numeric.

from sklearn.preprocessing import OneHotEncodertrf2 = ColumnTransformer([
('ohe_sex_embarked', OneHotEncoder(sparse=False, handle_unknown='ignore'), [1, 6])  # One-Scorching Encode Intercourse and Embarked
], the rest='passthrough')

Scaling

Scale the options to a given vary.

from sklearn.preprocessing import MinMaxScalertrf3 = ColumnTransformer([
('scale', MinMaxScaler(), slice(0, 10))  # Scale all features
])

Function Choice

Choose an important options.

from sklearn.feature_selection import SelectKBest, chi2trf4 = SelectKBest(score_func=chi2, ok=8)

Use a call tree classifier.

from sklearn.tree import DecisionTreeClassifiertrf5 = DecisionTreeClassifier()

Mix all transformations and the mannequin right into a single pipeline.

from sklearn.pipeline import Pipelinepipe = Pipeline([
('trf1', trf1),
('trf2', trf2),
('trf3', trf3),
('trf4', trf4),
('trf5', trf5)
])
# Prepare the pipeline
pipe.match(X_train, y_train)

Consider the mannequin on the take a look at information.

from sklearn.metrics import accuracy_scorey_pred = pipe.predict(X_test)
print(accuracy_score(y_test, y_pred))

Use cross-validation to test the mannequin’s robustness.

from sklearn.model_selection import cross_val_scoreprint(cross_val_score(pipe, X_train, y_train, cv=5, scoring='accuracy').imply())

Use grid search to seek out the very best hyperparameters.

from sklearn.model_selection import GridSearchCVparams = {
'trf5__max_depth': [1, 2, 3, 4, 5, None]
}
grid = GridSearchCV(pipe, params, cv=5, scoring='accuracy')
grid.match(X_train, y_train)
print(grid.best_score_)
print(grid.best_params_)

Export the skilled pipeline to a file for later use.

import picklepickle.dump(pipe, open('pipe.pkl', 'wb'))

Load the pipeline and use it for predictions.

pipe = pickle.load(open('pipe.pkl', 'rb'))# Instance consumer enter
test_input = np.array([2, 'male', 31.0, 0, 0, 10.5, 'S'], dtype=object).reshape(1, 7)
print(pipe.predict(test_input))

Pipelines in scikit-learn present a strong solution to handle the whole machine studying workflow, from preprocessing to mannequin coaching and analysis. By following this information, you’ll be able to construct strong and reproducible pipelines on your personal machine studying initiatives.

Source link

Building and Deploying Machine Learning Pipelines with scikit-learn | by Noor Fatima | Jun, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

Lagos Flood Prediction using RandomForestRegression. | by Mutawakkil Sanusi Babasidi | Jul, 2024

Decision Trees: Key Concepts. 1. Basic Structure of a Decision Tree | by Mandeep Singh Saluja | Jun, 2024

How Monte Carlo Tree Search works part5(Machine Learning) | by Monodeep Mukherjee | Apr, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Building and Deploying Machine Learning Pipelines with scikit-learn | by Noor Fatima | Jun, 2024

Imputation Transformer

One-Scorching Encoding

Scaling

Function Choice

Related Posts