• No products in the cart.

204.4.8 Problem of Under-fitting

What happens if the model is Under-fitted? Huge Bias?
Link to the previous post: https://course.dvanalyticsmds.com/204-4-7-problem-of-overfitting/

The Problem of Under-fitting

  • Simple models are better. It’s true but is that always true? May not be always true.
  • We might have given it up too early. Did we really capture all the information?
  • Did we do enough research and future re-engineering to fit the best model? Is it the best model that can be fit on this data?
  • By being over cautious about variance in the parameters, we might miss out on some patterns in the data.
  • Model need to be complicated enough to capture all the information present.
  • If the training error itself is high, how can we be so sure about the model performance on unknown data?
  • Most of the accuracy and error measuring statistics give us a clear idea on training error, this is one advantage of under fitting, we can identify it confidently.
  • Under fitting
    • A model that is too simple
    • A mode with a scope for improvement
    • A model with lot of bias

Practice : Model with huge Bias

  • Lets simplify the model.
  • Take the high variance model and prune it.
  • Make it as simple as possible.
  • Find the training error and validation error.

Solution

In [22]:
#We can prune the tree by changing the parameters 
tree_bias = tree.DecisionTreeClassifier(criterion='gini', 
                                              splitter='best', 
                                              max_depth=10, 
                                              min_samples_split=30, 
                                              min_samples_leaf=30, 
                                              max_leaf_nodes=20)
tree_bias.fit(X_train,y_train)
Out[22]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
            max_features=None, max_leaf_nodes=20, min_samples_leaf=30,
            min_samples_split=30, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best')
In [23]:
#Training accuracy
tree_bias.score(X_train,y_train)
Out[23]:
0.85344444444444445
In [24]:
#Lets prune the tree further.  Lets oversimplyfy the model
tree_bias1 = tree.DecisionTreeClassifier(criterion='gini', 
                                              splitter='random', 
                                              max_depth=1, 
                                              min_samples_split=100, 
                                              min_samples_leaf=100, 
                                              max_leaf_nodes=2)
tree_bias1.fit(X_train,y_train)
Out[24]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,
            max_features=None, max_leaf_nodes=2, min_samples_leaf=100,
            min_samples_split=100, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='random')
In [25]:
#Training Accuracy of new model
tree_bias1.score(X_train,y_train)
Out[25]:
0.68231111111111109
In [26]:
#Validation accuracy on test data
tree_bias1.score(X_test,y_test)
Out[26]:

0.68910000000000005

In next post we will discuss how to choose optimal model using Bias Variance Trade off.

Link to the next post : https://course.dvanalyticsmds.com/204-4-9-model-bias-variance-tradeoff/

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.