• No products in the cart.

Face Recognition

Before start our lesson please download the datasets.

Problem Statement

Face detection problem consists of 564 images of 20 people. Each covering a range of poses from profile to frontal views. Subjects cover a range of race/sex/appearance. Each subject exists in their own directory labelled 1a, 1b, 1t and images are numbered consequetively as they were taken. The files are all in PGM format, approximately 220 x 220 pixels in 256 shades of grey. We need to train a model using the features extracted from these images. Our model should recognize the faces of these people in any new image.

Data Exploration

All the images are stored in a single .mat file. We read the .mat file using scipy package

In [1]:
import warnings
warnings.filterwarnings('ignore')  # to hide warnings

import scipy.io
faces = scipy.io.loadmat('D:intern bangpython case studyFace Detection - UMist Facesumist_cropped.mat')

type(faces)
Out[1]:
dict

faces variable has all images data. It is a dict variable. Lets’ see the keys

In [2]:
faces.keys()
Out[2]:
dict_keys(['__header__', '__version__', '__globals__', 'facedat', 'dirnames'])
In [ ]:
It has five keys. We will check each key.
In [3]:
type(faces['__header__'])
Out[3]:
bytes
In [4]:
faces['__header__']
Out[4]:
b'MATLAB 5.0 MAT-file, Platform: GLNX86, Created on: Wed Aug 28 11:38:19 2002'

header‘ key has information about the .mat file.

version

In [5]:
type(faces['__version__'])
Out[5]:
str
In [6]:
faces['__version__']
Out[6]:
'1.0'

version‘ key indicates version of the file

globals

In [7]:
type(faces['__globals__'])
Out[7]:
list
In [8]:
faces['__globals__']
Out[8]:
[]

globals‘ key has no value

facedat

In [9]:
type(faces['facedat'])
Out[9]:
numpy.ndarray
In [10]:
faces['facedat'].shape
Out[10]:
(1, 20)
In [11]:
faces['facedat'][0][0].shape
Out[11]:
(112, 92, 38)
In [12]:
length=[]
sum=0
for i in range(0,20):
    length.append(faces['facedat'][0][i].shape)
    sum=sum+faces['facedat'][0][i].shape[2]
print(length)
print(sum)
[(112, 92, 38), (112, 92, 35), (112, 92, 26), (112, 92, 24), (112, 92, 26), (112, 92, 23), (112, 92, 19), (112, 92, 22), (112, 92, 20), (112, 92, 32), (112, 92, 34), (112, 92, 34), (112, 92, 26), (112, 92, 30), (112, 92, 19), (112, 92, 26), (112, 92, 26), (112, 92, 33), (112, 92, 48), (112, 92, 34)]
575

facedat‘ has pixel values of all the images. It is a multidimensional array. It has images of 20 persons. Number of images of each person is different. Size of each image is 112×92. Total number of images are 575.

dirnames

In [13]:
type(faces['dirnames'])
Out[13]:
numpy.ndarray
In [14]:
len(faces['dirnames'])
Out[14]:
1
In [15]:
faces['dirnames']
Out[15]:
array([[array(['1a'], 
      dtype='<U2'),
        array(['1b'], 
      dtype='<U2'),
        array(['1c'], 
      dtype='<U2'),
        array(['1d'], 
      dtype='<U2'),
        array(['1e'], 
      dtype='<U2'),
        array(['1f'], 
      dtype='<U2'),
        array(['1g'], 
      dtype='<U2'),
        array(['1h'], 
      dtype='<U2'),
        array(['1i'], 
      dtype='<U2'),
        array(['1j'], 
      dtype='<U2'),
        array(['1k'], 
      dtype='<U2'),
        array(['1l'], 
      dtype='<U2'),
        array(['1m'], 
      dtype='<U2'),
        array(['1n'], 
      dtype='<U2'),
        array(['1o'], 
      dtype='<U2'),
        array(['1p'], 
      dtype='<U2'),
        array(['1q'], 
      dtype='<U2'),
        array(['1r'], 
      dtype='<U2'),
        array(['1s'], 
      dtype='<U2'),
        array(['1t'], 
      dtype='<U2')]], dtype=object)

dirnames‘ consists of person id’s. As there are twenty people faces, there are twenty id’s from 1a,1b to 1t.

Image viewing

Now we will see some images present in the data set

In [10]:
%matplotlib inline
from matplotlib import pyplot as plt
image1=faces['facedat'][0][0][:,:,0]
plt.imshow(image1,cmap='Greys_r')
plt.show()
In [11]:
from matplotlib import pyplot as plt
image1=faces['facedat'][0][1][:,:,0]
plt.imshow(image1,cmap='Greys_r')
plt.show()

Model Building

Images pixel values are in multidimensional array. Every model trains on a 2-D data frame or array. So we should convert our image pixel values array into a 2-D array.

In [2]:
import numpy as np
a=np.concatenate((faces['facedat'][0][0:]),axis=2)  # concatenating all 20 images arrays into a single array(4D to 3D)
a.shape
Out[2]:
(112, 92, 575)
In [5]:
b=a.reshape((112*92,575))                           #chnaging 3D array 'a' into 2D array 
b.shape
Out[5]:
(10304, 575)
In [6]:
facedat_2d=b.swapaxes(1,0)                         #swaping axes to get a image pixel values into a row
facedat_2d.shape
Out[6]:
(575, 10304)

facedat_2d is our required 2D array. Next we should create a label array. There are twenty persons, each have different number of images. I am assigning labels from 0 to 19 for 1a to 1t respectively.

In [7]:
labels=np.zeros((575,))        #defining labels array as zero matrix
ip=0
out=faces['facedat'][0][0].shape[2]
for i in range(0,20):
    labels[ip:out]=i
    if (i<19):
            out=out+faces['facedat'][0][i+1].shape[2]
    ip=ip+faces['facedat'][0][i].shape[2]
    
labels.shape    
Out[7]:
(575,)

labels array has labels of the images which are present in rows of feature_2d array. Next we convert the feature_2d,labels arrays into data frames for easy handling of the data.

In [8]:
import pandas as pd

x_train=pd.DataFrame(facedat_2d)   
y_train=pd.DataFrame(labels)

Now our data is good for training any model. We will build different models and we choose the model which has high accuracy.

Neural Network

For building neural networks we will use neurolab package. We will directly use image pixel values for training.

In [ ]:
import neurolab as nl
import pylab as pl

#function to find min,max of each column
def minMax(x):
    return pd.Series(index=['min','max'],data=[x.min(),x.max()])

#storing min,max values into a list
listvalues = x_train.apply(minMax).T.values.tolist()

error = []
# Creating network with 1 hidden layer and random initialized
net = nl.net.newff(listvalues,[20,1],transf=[nl.trans.LogSig()] * 2)
net.trainf = nl.train.train_rprop

# Training network
import time  #to take note of time take to train the network
start_time = time.time()
error.append(net.train(x_train, y_train, show=0, epochs = 250,goal=0.02)) 
print("--- %s seconds ---" % (time.time() - start_time))
In [ ]:
#predicting using trained network
predicted_values = net.sim(x_train)

Even with very few nodes in hidden layer, neural network is taking huge time(around 1 hour) to train. So we will build another model.

SVM

We use image pixel values for training the model.

In [12]:
from sklearn import svm
clf = svm.SVC()
model =clf.fit(x_train,y_train)  #training the model

clf
Out[12]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
In [9]:
Predicted=clf.predict(x_train)   #predictions of training data
In [9]:
#confusion matrix
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(y_train,Predicted)
print(ConfusionMatrix)
[[38  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0 35  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0 24  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 23  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0 19  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0 22  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 32  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0 30  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0 19  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 33  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 48  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 34]]
In [10]:
#accuracy
accuracy=np.trace(ConfusionMatrix)/sum(sum(ConfusionMatrix))
print(accuracy)
1.0

Accuracy is one. It means that the model may be overfitting the data. Lets check for overfitting by doing cross validation on the model.

Cross Validation

We will do shuffleshift cross validation. Our model has 575 samples consisting of 20 persons faces. Each person has around 20 to 30 face images. So we should keep the ratio of testing to training samples in cross validation very low. If it is high cross validation method may assign most of the samples of some classes into testing set, which will result in low score. To avoid this we will keep the ratio very small.

In [13]:
from sklearn import cross_validation
cv = cross_validation.ShuffleSplit(575, n_iter=10,test_size=0.05, random_state=None)
scores = cross_validation.cross_val_score(clf,x_train,y_train,cv = cv)
scores_mean = scores.mean()

print('score=',scores_mean)             #mean of all the scores
print('# support vectors of the model are')
len(clf.support_)                       # gives Number of support vectors in the model
score= 0.131034482759
# support vectors of the model are
Out[13]:
575

Score is very low.Which indicates overfitting of the model. Number of support vectors in the model is close total number of images. It is taking every image as a support vector. Our samples are way less than features(pixels). This is leading to overfitting.So we extract features of the images for face detection.

PCA

PCA method has been popular technique in face recognition problem. We will do PCA on our images.

In [9]:
from sklearn.decomposition import RandomizedPCA
#Number of eigen faces to be used
n_components = 50    
#pca implementation
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(x_train)
#Projecting the input data on the eigenfaces orthonormal basis
x_train_pca = pca.transform(x_train)
x_train_pca.shape
Out[9]:
(575, 50)

x_train_pca has weights of images. We use this data for training our svm model.

In [14]:
from sklearn import svm
clf = svm.SVC()
model =clf.fit(x_train_pca,y_train)
print("Number of support vectors of the model are")
len(clf.support_)  # gives Number of support vectors in the model
Number of support vectors of the model are
Out[14]:
489
In [14]:
Predicted=clf.predict(x_train_pca)
In [15]:
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(y_train,Predicted)
print(ConfusionMatrix)
[[38  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0 35  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0 24  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 23  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0 19  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0 22  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 32  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0 30  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0 19  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 33  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 48  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 34]]
In [16]:
#accuracy
accuracy=np.trace(ConfusionMatrix)/sum(sum(ConfusionMatrix))
print(accuracy)
1.0

Cross Validation

In [15]:
from sklearn import cross_validation
cv = cross_validation.ShuffleSplit(575, n_iter=10,test_size=0.05, random_state=None)
scores = cross_validation.cross_val_score(clf,x_train_pca,y_train,cv = cv)
scores_mean = scores.mean()

print('score=',scores_mean)                                 #mean of all the scores
score= 0.989655172414

Score is very good. PCA weights are good features for face recognition.

Other Features

We used pca to get features of the images. Now we use mean,variance,skewness,kurtosis of the images for training the model

In [17]:
means=x_train.mean(axis=1)
variances=x_train.var(axis=1)
skewness=x_train.skew(axis=1)
kurtosis=x_train.kurtosis(axis=1)

features = pd.concat([ means,variances, skewness, kurtosis], axis =1)
features.shape
Out[17]:
(575, 4)
In [20]:
features.head(5)
Out[20]:
0 1 2 3
0 107.884608 3081.828768 0.310578 -0.854679
1 109.478940 3113.214542 0.291686 -0.865699
2 111.512422 3149.638398 0.280505 -0.864741
3 112.495924 3243.977077 0.274591 -0.881178
4 113.126165 3272.139570 0.265341 -0.896785

features variable has mean,variance,skewness,kurtosis of every image in rows. We use this for building the model.

In [18]:
clf = svm.SVC()
model =clf.fit(features,y_train)
Predicted=clf.predict(features)
print("Number of support vectors of the model are")
len(clf.support_)  # gives Number of support vectors in the model
Number of support vectors of the model are
Out[18]:
573
In [22]:
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(y_train,Predicted)
print(ConfusionMatrix)
[[38  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0 35  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 25  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0]
 [ 0  0  0 24  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 22  0  0  0  0  0  0  1  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0 19  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0 22  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 32  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0 30  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0 18  0  0  0  1  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 33  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 48  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 34]]
In [23]:
#accuracy
accuracy=np.trace(ConfusionMatrix)/sum(sum(ConfusionMatrix))
print(accuracy)
0.994782608696

Cross Validation

In [19]:
from sklearn import cross_validation
cv = cross_validation.ShuffleSplit(575, n_iter=10,test_size=0.05, random_state=None)
scores = cross_validation.cross_val_score(clf,x_train_pca,y_train,cv = cv)
scores_mean = scores.mean()

print('score=',scores_mean)                                 #mean of all the scores
score= 0.986206896552

Cross validation score is good for both the models using pca and features of mean,variance,kurtosis,skewnees, but the later model is using most of the samples as support vectors which is not good. So we dont use that model. We take the model which is using pca features.

Random Forest

We will build Random Forest model using PCA weights.

In [20]:
from sklearn.ensemble import RandomForestClassifier
forest=RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, 
                              min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', 
                              max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, 
                              verbose=0, warm_start=False, class_weight=None)

forest.fit(x_train_pca,y_train)
Out[20]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
In [26]:
Predicted=forest.predict(x_train_pca)

from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(y_train,Predicted)
print(ConfusionMatrix)
[[38  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0 35  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0 24  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 23  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0 19  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0 22  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  1  0  0  0 31  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0 30  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0 19  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 33  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 48  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 34]]
In [27]:
#accuracy
accuracy=np.trace(ConfusionMatrix)/sum(sum(ConfusionMatrix))
print(accuracy)
0.998260869565

Cross Validation

In [21]:
from sklearn import cross_validation
cv = cross_validation.ShuffleSplit(575, n_iter=10,test_size=0.05, random_state=None)
scores = cross_validation.cross_val_score(forest,x_train_pca,y_train,cv = cv)
scores_mean = scores.mean()
print('score=',scores_mean)                                 #mean of all the scores
score= 0.931034482759

Cross validation Score is good. Random Forest model is also a good model.

Bagging

Now we will do bagging on the x_train_pca data set.

In [22]:
from sklearn.ensemble import BaggingClassifier
bagging=BaggingClassifier(base_estimator=svm.SVC(), n_estimators=5, max_samples=1.0, 
                                max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, 
                                warm_start=False, n_jobs=1, random_state=None, verbose=0)

bagging.fit(x_train_pca,y_train)
Out[22]:
BaggingClassifier(base_estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=5, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)
In [33]:
Predicted=bagging.predict(x_train_pca)

from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(y_train,Predicted)
print(ConfusionMatrix)
[[38  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0 35  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0 24  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 23  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0 19  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  1  0  0 21  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0]
 [ 1  0  0  0  0  0  0  0  0 31  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0 34  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0 30  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0 19  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 26  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 33  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 48  0]
 [ 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 33]]
In [34]:
#accuracy
accuracy=np.trace(ConfusionMatrix)/sum(sum(ConfusionMatrix))
print(accuracy)
0.994782608696

Cross Validation

In [23]:
cv = cross_validation.ShuffleSplit(575, n_iter=10,test_size=0.05, random_state=None)
scores = cross_validation.cross_val_score(forest,x_train_pca,y_train,cv = cv)
scores_mean = scores.mean()
print('score=',scores_mean)                                 #mean of all the scores
score= 0.941379310345

Conclusion

Cross validation scores of all the models are good. SVM is giving highest score. So we take that as our final model.

DV Analytics

DV Data & Analytics is a leading data science training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.