Machine Studying (ML) is a buzzword that has taken the tech business by storm. However what precisely is it? Let’s dive in.
Machine Studying is a strategy of studying complicated patterns in current knowledge after which making use of these discovered patterns/relationships to foretell patterns in new and unseen knowledge. It’s a subset of Synthetic Intelligence (AI) that permits methods to study and enhance from expertise with out being explicitly programmed.
For a Machine Studying system to study, it will need to have some knowledge. The info saved in any database can not set up the connections between the enter and output columns themselves. That’s the place ML methods are available in. They’ve the capability to study the relationships between the enter and output attributes and predict the result based mostly on the complicated patterns the system has discovered from the coaching knowledge.
Let’s think about a Supervised Studying drawback the place it is advisable discover the sale worth of a selected product, say a cleaning soap. You would wish to supply the ML algorithm with knowledge associated to:
- Measurement of the cleaning soap
- The uncooked materials used to arrange the cleaning soap (particular substances used)
- The method of creating the cleaning soap (handmade, natural, any particular course of)
- Price of the uncooked materials
- Price of the labor and equipment used to arrange
- Packaging and delivery value
- Model of the cleaning soap
- Market demand
- Competitor costs
- Location of the gross sales
- Market phase to which the cleaning soap is offered, and so on.
And the related output worth of every cleaning soap. The ML algorithm will then study the connection between all these enter and output attributes and can predict the worth of the brand new cleaning soap when its enter attributes are given.
You’ll be able to take a look at this pocket book to seek out the worth of a cleaning soap by coaching a linear regression mannequin utilizing this pattern knowledge as proven within the determine.
Python
#import the mandatory libraries
import pandas as pd
import numpy as np
from io import StringIO
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Pattern CSV knowledge for the prediction of the worth of a cleaning soap
csv_data = """
Measurement (g),Uncooked Materials (Particular Elements),Course of,Uncooked Materials Price,Labor and Equipment Price,Packaging and Transport Price,Model,Market Demand,Competitor Worth,Location,Market Section,Worth
120,"Lavender, Coconut Oil",Handmade,5.5,3.2,1.8,OrganicSoap,Excessive,15,City,Luxurious,20
80,"Aloe Vera, Olive Oil",Natural,4.2,2.8,1.5,NatureFresh,Medium,12,Suburban,On a regular basis,12
150,"Charcoal, Tea Tree Oil",Specialty,6.8,4.5,2.3,CharcoPure,Low,18,Rural,Spa,25
100,"Shea Butter, Almond Oil",Handmade,5,3,1.2,HandCrafted,Excessive,14,City,Natural,18
130,"Inexperienced Tea Extract, Jojoba Oil",Natural,5.8,3.5,1.6,GreenEleg,Medium,13,Suburban,On a regular basis,15
90,"Chamomile, Sunflower Oil",Handmade,4.5,2.7,1,PureBliss,Excessive,16,City,Luxurious,22
110,"Rosehip Oil, Oatmeal",Specialty,6,3.8,2,RoseSilk,Medium,17,Suburban,Spa,19
75,"Cucumber Extract, Coconut Oil",Handmade,4,2.5,1.1,FreshGlow,Excessive,15,City,Natural,16
140,"Peppermint, Almond Oil",Natural,6.5,4,2.2,MintyFresh,Medium,14,Suburban,On a regular basis,17
95,"Mango Butter, Avocado Oil",Handmade,4.8,3.1,1.3,ExoticMango,Excessive,16,City,Luxurious,21
120,"Lemongrass, Argan Oil",Natural,5.7,3.3,1.4,CitrusBurst,Medium,13,Suburban,Spa,14
85,"Lavender, Coconut Oil",Handmade,4.2,2.6,1,LavishScent,Excessive,15,City,Natural,18
130,"Charcoal, Tea Tree Oil",Specialty,6.5,4.2,2.1,PureChar,Low,17,Rural,Spa,23
100,"Aloe Vera, Olive Oil",Natural,4.8,3,1.5,AlohaFresh,Medium,13,Suburban,On a regular basis,15
150,"Shea Butter, Almond Oil",Handmade,7,4.8,2.5,SilkTouch,Excessive,18,City,Luxurious,25
80,"Inexperienced Tea Extract, Jojoba Oil",Natural,4,2.3,1.2,GreenTease,Medium,12,Suburban,On a regular basis,11
110,"Chamomile, Sunflower Oil",Handmade,5.3,3.5,1.7,CalmEssence,Excessive,16,City,Natural,19
120,"Rosehip Oil, Oatmeal",Specialty,6.2,4,2.2,RoseSilk,Medium,17,Suburban,Spa,20
90,"Cucumber Extract, Coconut Oil",Handmade,4.2,2.4,1.1,FreshGlow,Excessive,15,City,Natural,16
140,"Peppermint, Almond Oil",Natural,6.8,4.2,2.3,MintyFresh,Medium,14,Suburban,On a regular basis,18
"""
# Creating DataFrame
df = pd.read_csv(StringIO(csv_data))# Extracting the options and goal variable
X = df.drop(['Price'], axis = 1) # Options
y = df['Price'] # Goal
# Changing categorical variables into dummy variables
X = pd.get_dummies(X, columns=['Raw Material (Special Ingredients)', 'Process', 'Brand','Market Demand', 'Location', 'Market Segment'])
# Break up the information into coaching and testing units
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the Linear Regression mannequin
mannequin = LinearRegression()
# Prepare the mannequin
mannequin.match(X_train, y_train)
# Make predictions on the check set
y_pred = mannequin.predict(X_test)
# Consider the mannequin
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(f'Imply Squared Error: {mse}')
print(f'Root Imply Squared Error: {rmse}')
# Instance knowledge for a brand new cleaning soap
new_soap_data = {
'Measurement (g)': [110],
'Uncooked Materials (Particular Elements)': ['Green Tea Extract, Olive Oil'],
'Course of': ['Organic'],
'Uncooked Materials Price': [5.2],
'Labor and Equipment Price': [3.0],
'Packaging and Transport Price': [1.2],
'Model': ['AlohaFresh'],
'Market Demand': ['Medium'],
'Competitor Worth': [14],
'Location': ['Urban'],
'Market Section': ['Everyday']
}
# Making a DataFrame for the brand new cleaning soap
new_soap_df = pd.DataFrame(new_soap_data)
# Convert categorical variables into dummy/indicator variables
new_soap_df = pd.get_dummies(new_soap_df, columns=['Raw Material (Special Ingredients)','Process','Brand','Market Demand','Location','Market Segment'])
# Be sure that the brand new DataFrame has the identical columns because the coaching knowledge
missing_cols = set(X.columns) - set(new_soap_df.columns)
for col in missing_cols:
new_soap_df[col] = 0
# Reorder columns to match the order throughout coaching
new_soap_df = new_soap_df[X.columns]
# Make predictions utilizing the skilled mannequin
predicted_price = mannequin.predict(new_soap_df)
print(f'Predicted Worth: {predicted_price[0]}')
Checkout this pocket book on git
https://github.com/SpandanaKalakonda/linkedinPosts/blob/main/Predicting_the_price_of_a_soap.ipynb