• No products in the cart.

204.4.6 Type of Datasets, Type of Errors and Problem of Overfitting

Things to know before proceeding further.
Link to the previous post : https://course.dvanalyticsmds.com/204-4-5-what-is-a-best-model/

Different Type of Datasets and Errors

The Training Error

  • The accuracy of our best model is 95%. Is the 5% error model really good?
  • The error on the training data is known as training error.
  • A low error rate on training data may not always mean the model is good.
  • What really matters is how the model is going to perform on unknown data or test data.
  • We need to find out a way to get an idea on error rate of test data.
  • We may have to keep aside a part of the data and use it for validation.
  • There are two types of datasets and two types of errors.

Two Types of Datasets

  • There are two types of datasets.
  • Training set: This is used in model building. The input data.
  • Test set: The unknown dataset. This dataset is gives the accuracy of the final model.
  • We may not have access to these two datasets for all machine learning problems. In some cases, we can take 90% of the available data and use it as training data and rest 10% can be treated as validation data.
  • Validation set: This dataset kept aside for model validation and selection. This is a temporary subsite to test dataset. It is not third type of data.
  • We create the validation data with the hope that the error rate on validation data will give us some basic idea on the test error.

Types of Errors

  • The training error
  • The error on training dataset
  • In-time error
  • Error on the known data
  • Can be reduced while building the model
  • The test error
  • The error that matters
  • Out-of-time error
  • The error on unknown/new dataset.

“A good model will have both training and test error very near to each other and close to zero”

Yes, this post is quite small but we will need to know Type of Errors and Type of Dataset to validate a model.

The next post is about problem of overfitting.

Link to the next post : https://course.dvanalyticsmds.com/204-4-7-problem-of-overfitting/

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.