The Data

Above problem describes three fields: Gender, Marital Status and whether the product is ordered or not by the customer. Some customers are male and some are female, some customers are married and some are unmarried. Now we have to decide that, if the customer is Male and if he is married, will he order the Product or not? At the same time we also need to decide if the person is female and if she is unmarried, will she order the product or not?

Using the historical data, if someone has higher probability to order the product then we might sent a different message and if one customer has very low probability to order then we will send a message with discount to attract them to buy the product.

If the customer has already higher probability then we will try to do upselling by sending other discounts or cross selling by showing them a different product so that they can buy 2 products. In this way we can improve our business.

For solving the problem we have to rearrange the data

Re-Arranging the data

Observations

From the above result we have noticed that there are total 14 customers. Among these 14 customers, 10 customers did not order the product and 4 have ordered the product. Among these 14 customers there are 8 Males and 6 Females. Among these 8 Males, 6 are married and 2 are Unmarried. One married (Male) and one unmarried (Male) have ordered the product.Among 6 Females, 3 are married and 3 are unmarried. All unmarried females have ordered the product whereas married females did not order the product.

This analysis clearly shows that females who are unmarried have high probability to buy the product whereas married females are not interested to buy the product. On the other side, Males who are married are not interested to buy the product whereas 50% of Males who are Unmarried are interested to buy the product

Therefore,

  Married Males won't buy the product whereas 

  Unmarried Females will buy the product

Decision Trees in Python

Introduction

Contents

What is Segmentation?

Segmentation Business Problem

The Data

Re-Arranging the data

The Decision Tree Approach

Example Sales Segmentation Based on Age

Observations

Example Sales Segmentation Based on Gender

Main Questions

The Splitting Criterion

Example Sales Segmentation Based on Age

Observations

Example Sales Segmentation Based on Gender

Observations

Impurity (Diversity) Measures

Entropy

Note

Entropy is highest when the split has p of 0.5

Entropy is least when the split is pure .ie p of 1

The less the entropy, the better the split

Entropy Calculation – Example

LAB: Entropy Calculation – Example

Code- Entropy Calculation

Information Gain

Information Gain- Calculation

LAB: Information Gain

Output-Information Gain

Other Purity (Diversity) Measures

The Decision tree Algorithm

The Decision tree Algorithm – Demo

Many Splits for a Single Variable

The Decision tree Algorithm- Full version

Explanation of the Algorithm

LAB: Decision Tree Building

Solution

LAB: Tree Validation

LAB: The Problem of Overfitting

Solution

The Problem of Overfitting

Pruning

Pruning to Avoid Overfitting

Code-Tree Pruning

LAB: Tree Building Model Selection

Solution

Conclusion