• No products in the cart.

204.7.8 Practice : Boosting

Building Boosting models : Gradient Boosting and ADA Boosting

Link to the previous post : https://course.dvanalyticsmds.com/204-7-7-boosting/

In the last post we covered the concepts and theory behind Boosting Algorithms.

In this post we will put the concepts into practice and build Boosting models using Scikit Learn in python.

 

Boosting

  • Rightly, categorizing the items based on their detailed feature specifications. More than 100 specifications have been collected.
  • Data: Ecom_Products_Menu/train.csv
  • Build a decision tree model and check the training and testing accuracy.
  • Build a boosted decision tree.
  • Is there any improvement from the earlier decision tree.

 

In [20]:
#importing the datasets
menu_train=pd.read_csv("datasets\\Ecom_Products_Menu\\train.csv")
menu_test=pd.read_csv("datasets\\Ecom_Products_Menu\\test.csv")
In [21]:
lab=list(menu_train.columns[1:101])
g=menu_train[lab]
h=menu_train['Category']
In [22]:
###buildng Decision tree on the training data ####
from sklearn import tree
tree = tree.DecisionTreeClassifier()
tree.fit(g,h)
Out[22]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best')
In [23]:
#####predicting the tree  on test data ####
tree_predict=tree.predict(menu_test[lab])
from sklearn.metrics import f1_score
f1_score(menu_test['Category'], tree_predict, average='micro')
Out[23]:

0.70993535216059889

Gradient Boosting

In [24]:
###Building a gradient boosting clssifier ###
from sklearn import ensemble
from sklearn.ensemble import GradientBoostingClassifier
boost=GradientBoostingClassifier(loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort='auto') 
In [25]:
##calculating the time while fitting the Gradient boosting classifier
import datetime
start_time = datetime.datetime.now()
##fitting the gradient boost classifier
boost.fit(g,h)
end_time = datetime.datetime.now()
print(end_time-start_time)
0:03:15.513182
In [26]:
###predicting Gradient boosting model on the test Data
boost_predict=boost.predict(menu_test[lab])
from sklearn.metrics import f1_score
f1_score(menu_test['Category'], boost_predict, average='micro') 
Out[26]:
0.78717250765566504

We see an accuracy of 78% after Gradient boosting model,where as it is 70% in decison tree building. Our accuracy has improved by 8%.

ADA Boosting

In [27]:
##building an AdaBoosting Classifier #### 
from sklearn import ensemble
from sklearn.ensemble import AdaBoostClassifier
ada=AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None)
ada.fit(g,h)
Out[27]:
AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
          learning_rate=1.0, n_estimators=50, random_state=None)
In [28]:
### Predicting the AdaBoost clssifier on Test Data
ada_predict=ada.predict(menu_test[lab])
from sklearn.metrics import f1_score
f1_score(menu_test['Category'], ada_predict, average='micro')
Out[28]:

0.69555971418849949

Well, ada Boosting didn’t give us improved results as we expected.

The next post is about boosting conclusion.

Link to the next post :  https://course.dvanalyticsmds.com/boosting-conclusion/

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.