From Theory to Practice: Understanding Bias and Variance | by Amberella Academy | Jun, 2024

Overfitting and underfitting are two of essentially the most essential but misunderstood matters for newbies in machine studying. Whereas newbies may perceive these ideas theoretically, making use of them virtually usually proves difficult. This text goals to deal with these points and make sensible implications simpler to know.

Bias:

Bias refers back to the assumptions made by the mannequin to simplify the training means of the goal perform. It may be seen as an inherent error that persists even with infinite coaching knowledge. This happens as a result of the mannequin is biased towards a specific resolution. If a mannequin makes the identical mistake repeatedly, it’s thought of biased.

Low bias: Signifies fewer assumptions (e.g., KNN, Resolution tree)
Excessive bias: Signifies extra assumptions (e.g., Linear regression, logistic regression)

A mannequin with excessive bias pays little consideration to the coaching knowledge, resulting in underfitting.

Variance:

Variance measures the mannequin’s sensitivity to adjustments within the coaching knowledge. It signifies how a lot the mannequin’s predictions would change if it have been skilled on completely different datasets. Excessive variance means the mannequin pays an excessive amount of consideration to the coaching knowledge, resulting in overfitting and problem in generalizing to new knowledge.

Low variance: Small adjustments in predictions with completely different datasets
Excessive variance: Massive adjustments in predictions with completely different datasets

Case 1:

Take into account the R-square worth of a mannequin:

R-square on prepare knowledge: 0.29
R-square on take a look at knowledge: 0.02

Because the distinction between the R-square values on prepare and take a look at knowledge is critical, the mannequin reveals excessive variance.

Case 2:

Take into account polynomial regression of levels 1, 2, and three with the next R-square values:

The diploma 3 mannequin reveals the utmost change in R-square worth (from 1 to 0.07), indicating the very best variance.

Case 3: Random Forest Classifier

# Create a Random Forest classifier
rf_model = RandomForestClassifier(n_estimators=200, max_depth=10, 
random_state=42)

Lowering the variety of timber will scale back mannequin complexity, resulting in underfitting.
Growing the variety of timber improves efficiency and reduces variance with out inflicting overfitting.
Lowering the depth of the timber simplifies the mannequin, resulting in underfitting.
Growing the depth of the timber makes the mannequin extra advanced, resulting in overfitting.

Case 4: Ok-Nearest Neighbors (KNN)

# Outline the variety of neighbours for KNN
okay = 2# Initialize the KNN classifier
knn_classifier = KNeighborsClassifier(n_neighbors=okay, metric='euclidean')

Growing the worth of ‘okay’ will increase bias, making the choice boundary smoother. An optimum worth of ‘okay’ could be chosen by way of cross-validation.

Case 5: Naive Bayes Classifier

# Initialize the Naive Bayes classifier 
nb_model = GaussianNB(var_smoothing=1e-9)

var_smoothing is a hyperparameter that provides a small worth to the variance of every function to keep away from division by zero and deal with numerical stability. Growing the var_smoothing worth makes the mannequin much less delicate to small variations within the knowledge, resulting in underfitting.

Source link

From Theory to Practice: Understanding Bias and Variance | by Amberella Academy | Jun, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Podcast: The Batch 11/20/2024 Discussion

Facing Frequent Data Disruptions? EMI Protection Could Be the Solution

How Remote Sensing is Driving Data-Driven Decisions Across Industries

LogicMonitor Seeks to Disrupt AI Landscape with an $800 Million Strategic Investment at a Valuation of Approximately $2.4 Billion to Revolutionize Data Centers

Denodo Platform 9.1 Brings New Advanced AI Capabilities and Enhanced Data Lakehouse Performance

Our Picks

What’s next for generative video

AutoGen AI Agent framework for beginners | by Mehul Gupta | Data Science in your pocket | May, 2024

Should a higher degree of ethics apply to the development and use of AI? | by Temidayo | Jun, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

From Theory to Practice: Understanding Bias and Variance | by Amberella Academy | Jun, 2024

Bias:

Variance:

Case 1:

Case 2:

Case 3: Random Forest Classifier

Case 4: Ok-Nearest Neighbors (KNN)

Case 5: Naive Bayes Classifier

Related Posts