Customer Lifetime Value (CLV) Prediction With Machine Learning and DB Querying With LLM | by Rindhuja Treesa Johnson | May, 2024

Segmentation of consumers primarily based on their traits is a crucial enterprise technique for personalized promotions and provides. We tried to incorporate segmentation by the Okay-Means clustering methodology primarily based on the optimum variety of clusters as derived from the PCA evaluation and t-SNE. Determine 8 conveys that segmentation utilizing Okay-Means clustering just isn’t an method for this dataset. Go to the project repository for the code.

We’ve accomplished the EDA portion of the venture and we are going to transfer to the machine studying a part of it.

Machine Studying Fashions and Optimization

Information Preparation

The core of this venture is to develop a machine-learning mannequin that finest predicts the goal variable for a brand new shopper. It’s essential to arrange the info for modeling and the principle steps for preparation had been —

Standardizing identifiers: To keep away from any errors or warnings whereas modeling, we standardized the column names of the dataset by eradicating the areas and including underscores as a substitute.
Encoding the Categorical variables: All of the ML fashions require numerical knowledge to course of and subsequently, we encoded all the thing sort variables utilizing the LabelEncoder() operate.
Splitting the Dataset: After eradicating the goal variable from the dataset, the info body was break up into practice and take a look at units with a take a look at dimension of 30%.

# Standardizing column names
df.columns = [x.lower() for x in df.columns]
df.columns = df.columns.str.exchange(' ', '_')# Categorical variable encoding
for col in df_cat: # df_cat is the df of categorical variables
le = LabelEncoder()
le.match(df_cat[col])
df_cat[col] = le.rework(df_cat[col])
# Ultimate dataset earlier than splitting combining categorical and numerical variables
df_final =pd.concat([df_num,df_cat],axis=1)
# Splitting the dataset
from sklearn.model_selection import train_test_split
X = df_final.drop(['customer_lifetime_value','policy_type','policy'],axis=1)
y = df_final['customer_lifetime_value']
y = np.log(y)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.30,random_state=42)

The information is prepared for modeling!

Machine Studying Modeling

Linear Regression with Lasso (L1) and Ridge (L2) regularizations

For Linear Regression, we utilized the Lasso (L1) and Ridge (L2) Regularizations to optimize the efficiency metrics of the mannequin by avoiding over-fitting. Learn extra about regularizations here.

# import libraries for regression and efficiency evaluation
from sklearn.linear_model import Ridge, Lasso
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error# Lasso regression becoming and prediction
lasso = Lasso(random_state=1)
lasso.match(X_train,y_train)
y_pred = lasso.predict(X_test)
# Efficiency Metrics for Lasso
print('RMSE:',np.sqrt(mean_squared_error(y_test, y_pred)))
print('R-Squared:', r2_score(y_test, y_pred))
print(f'R^2 rating for practice: {lasso.rating(X_train, y_train)}')
print(f'R^2 rating for take a look at: {lasso.rating(X_test, y_test)}')
# Ridge regression becoming and prediction
ridge = Ridge(alpha=0.1)
ridge.match(X_train,y_train)
y_pred = ridge.predict(X_test)
# Efficiency Metrics for Ridge
print('RMSE:',np.sqrt(mean_squared_error(y_test, y_pred)))
print('R-Squared:', r2_score(y_test, y_pred))
print(f'R^2 rating for practice: {ridge.rating(X_train, y_train)}')
print(f'R^2 rating for take a look at: {ridge.rating(X_test, y_test)}')

Determine 9: Linear Regression — Efficiency Metrics. Picture by Creator

Determine 9 provides the efficiency metrics of the Linear regression mannequin regularized by the Lasso and Ridge fashions. It has a low R-squared on each practice and take a look at knowledge units.

2. Choice Tree Regressor

On modeling the info with the Choice Tree Regressor, we encounter an R-squared of 1.0 on the practice knowledge and solely 0.84 on the take a look at knowledge indicating an over-fitting of the mannequin on the practice dataset.

# import the library from sci-kit be taught
from sklearn.tree import DecisionTreeRegressor# Becoming and predicting with Choice Tree
dt = DecisionTreeRegressor(random_state=1)
dt.match(X_train, y_train)
y_pred =dt.predict(X_test)
# Efficiency metrics for Choice Tree Regressor
print('RMSE:',np.sqrt(mean_squared_error(y_test, y_pred)))
print('MAE:', mean_absolute_error(y_test, y_pred) )
print('R-Squared:', r2_score(y_test, y_pred))
print(f'R^2 rating for practice: {dt.rating(X_train, y_train)}')
print(f'R^2 rating for take a look at: {dt.rating(X_test, y_test)}')

Determine 10: Efficiency Metrics for Choice Tree Regressor. Picture by Creator

3. Random Forest Regressor

With the Random Forest Regressor, we get hold of a practice knowledge accuracy (R²) of 98% and a take a look at knowledge accuracy of 90%. Amongst all of the fashions till this level, Random Forest Regressor proves to be probably the most dependable prediction mannequin.

# import the library
from sklearn.ensemble import RandomForestRegressor# Becoming and predicting utilizing Random Forest Regressor
rf = RandomForestRegressor(n_estimators=10, random_state=1)
rf.match(X_train, y_train)
y_pred = rf.predict(X_test)
# Performace metrics for Random Forest Regressor
print('RMSE:',np.sqrt(mean_squared_error(y_test, y_pred)))
print('MAE:', mean_absolute_error(y_test, y_pred) )
print('R-Squared:', r2_score(y_test, y_pred))
print(f'R^2 rating for practice: {rf.rating(X_train, y_train)}')
print(f'R^2 rating for take a look at: {rf.rating(X_test, y_test)}')

Determine 11: Efficiency Metrics for Random Forest Regressor. Picture by Creator

Along with the above fashions, we carried out the Hypertuned-Random Forest, AdaBoost, and Neural Community fashions. Nevertheless, the take a look at accuracies of those had been decrease than the Random Forest Regressor mannequin. Despite the fact that the hyper-tuning of parameters improved the accuracy of the Random Forest mannequin, it required larger computational effectivity and was discarded. Determine 12 exhibits the abstract of the efficiency metrics of all of the fashions examined on the dataset.

Determine 12: The abstract of Performances of the ML fashions examined

To conclude, we thought-about the Random Forest Regressor for predicting the Buyer Lifetime Worth of the brand new prospects.

Person-Interface for CLV Prediction

For Firm X to know buyer conduct earlier than issuing an insurance coverage coverage, we developed a web site scripted in HTML and styled by CSS and JavaScript that provides an instantaneous prediction of the Buyer Lifetime Worth of any potential shopper taking into account the traits we utilized for the prediction mannequin.

We transformed the Random Forest prediction mannequin right into a .sav file and related it to an HTML file utilizing the Python libraries pickle (for connecting to the .sav file and executing) and flask(rendering the webpage upon operating the mannequin).

Determine 13: The touchdown web page of the online web page. Designed by Nithin Goud

Determine 13 is the web site’s touchdown web page that predicts the Buyer Lifetime Worth for any distinctive buyer given all of the 19 attributes required to foretell it. This provides Firm X an higher hand in deciding the danger concerned with every of their incoming shoppers thus reducing the possibilities of loss. The web site just isn’t rendered publicly because of fees concerned in securing a cope with an online server. The scripts and information may be seen within the GitHub Repository.

Q&A Interface for Information Retrieval LLM

Additional, we prolonged this venture to implement a Massive Language Mannequin for extracting knowledge from the database. We obtained the Python API keys for making use of Google’s Gemini LLM mannequin utilizing the GooglePalm class from the langchain library. The LLM class object is used to transform human prompts to SQL queries.

The Python interface reads the data-related query in human language, passes it to the langchain object that converts it to machine-readable language, passes it to SQLDatabaseChain object that converts it right into a SQL question, then the SQLDatabase framework connects MySQL Server to the Python software and executes the question by retrieving the specified knowledge.

Determine 14: A Glimpse of the Q&A web page designed for Information Retrieval utilizing LLM and MySQL Database. Designed by Nithin Goud

Thus, we will get any info (Figure 14) from the database primarily based on the info attributes listed on the web page for reference. Any questions past the scope of the info will throw an error. All codes and information are on GitHub.

Conclusion

This Information Science Venture leveraged by Firm X for the prediction of Buyer Lifetime Worth of recent prospects can achieve the next outcomes —

Improved Buyer Retention: Establish high-value prospects for focused promotional provides and loyalty packages, resulting in elevated buyer retention and decreased churn charges.
Enhanced Advertising Effectiveness: Allow data-driven allocation of promoting sources in the direction of high-value buyer segments, maximizing return on funding.
Information-driven Choice-Making: Empower stakeholders with CLV insights and interactive visualizations, facilitating knowledgeable selections relating to buyer acquisition, retention, and general enterprise technique.
Enhanced Person Expertise: Present a user-friendly Q&A interface for simple entry to info, selling knowledge democratization and data sharing throughout the group decreasing the time required for creating environment friendly SQL queries.

References

Danao, M. (2023). What is Customer Lifetime Value (CLV)? Forbes Advisor.
LangChain, I. (2024a). LLMs.
LangChain, I. (2024b). SQL Database.
Bhattacharyya, S. (2018). Ridge and Lasso Regression: L1 and L2 Regularization. In the direction of Information Science

Source link

Customer Lifetime Value (CLV) Prediction With Machine Learning and DB Querying With LLM | by Rindhuja Treesa Johnson | May, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Preparing Finance Data for AI: A 5-Step Data Cleansing Checklist

Our Picks

New Global Survey Finds Enterprises Lack Direction and Training for Workers on the Use of GenAI Tools

The Realistic Picture of a Machine Learning Model Lifecycle | by Sulaiman Shamasna | Jun, 2024

Google AlphaFold 3 Pioneers a New Era in Biomolecular Modeling

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Customer Lifetime Value (CLV) Prediction With Machine Learning and DB Querying With LLM | by Rindhuja Treesa Johnson | May, 2024

Machine Studying Fashions and Optimization

Person-Interface for CLV Prediction

Q&A Interface for Information Retrieval LLM

Conclusion

References

Related Posts