204.6.9 Digit Recognition using SVM – DV Self Paced Courses

Link to the previous post : https://course.dvanalyticsmds.com/204-6-8-svm-advantages-disadvantages-applications/

In this , we will put SVM into practice by solving an image classification problem.

LAB: Digit Recognition using SVM

Take an image of a handwritten single digit, and determine what that digit is.
Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990).
The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values.
Build an SVM model that can be used as the digit recognizer.
Use the test dataset to validate the true classification power of the model.
What is the final accuracy of the model?

Solution

#Importing test and training data
digits_train <- read.table("C:\\Amrita\\Datavedi\\Digit Recognizer\\USPS\\zip.train.txt", quote="\"", comment.char="")
digits_test <- read.table("C:\\Amrita\\Datavedi\\Digit Recognizer\\USPS\\zip.test.txt", quote="\"", comment.char="")
dim(digits_train)

## [1] 7291  257

dim(digits_test)

## [1] 2007  257

#Lets see some images. 
for(i in 1:6 )
{
data_row<-digits_train[i,-1]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , digits_train[i,1]), font.main = 4)
}

#Are there any missing values?

sum(is.na(digits_train))

## [1] 0

sum(is.na(digits_test))

## [1] 0

#The first variable is label
table(digits_train$V1)

## 
##    0    1    2    3    4    5    6    7    8    9 
## 1194 1005  731  658  652  556  664  645  542  644

table(digits_test$V1)

## 
##   0   1   2   3   4   5   6   7   8   9 
## 359 264 198 166 200 160 170 147 166 177

########SVM Model Building 
library(e1071)

#Lets keep an eye on runtime
pc <- proc.time()

#Verify the code with limited data 5000 rows
number.svm <- svm(V1 ~. , type="C", data = digits_train[1:5000,])

proc.time() - pc

##    user  system elapsed 
##   38.25    0.14   39.37

summary(number.svm)

## 
## Call:
## svm(formula = V1 ~ ., data = digits_train[1:5000, ], type = "C")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.00390625 
## 
## Number of Support Vectors:  2028
## 
##  ( 181 232 245 189 195 45 220 206 305 210 )
## 
## 
## Number of Classes:  10 
## 
## Levels: 
##  0 1 2 3 4 5 6 7 8 9

#Confusion Matrix
library(caret)
label_predicted<-predict(number.svm, type = "class")
confusionMatrix(label_predicted,digits_train[1:5000, 1])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 847   0   0   0   0   0   1   0   0   0
##          1   0 674   1   0   1   0   1   0   0   0
##          2   0   0 484   0   0   1   0   0   0   0
##          3   0   0   1 392   0   0   0   0   1   1
##          4   0   0   2   0 429   0   0   1   0   0
##          5   0   0   0   1   0 350   1   0   2   0
##          6   0   0   0   0   1   1 475   0   0   0
##          7   0   0   0   0   0   0   0 459   1   2
##          8   0   0   0   2   0   0   0   0 383   0
##          9   0   0   0   0   3   0   0   1   0 481
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9948          
##                  95% CI : (0.9924, 0.9966)
##     No Information Rate : 0.1694          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9942          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            1.0000   1.0000   0.9918   0.9924   0.9885   0.9943
## Specificity            0.9998   0.9993   0.9998   0.9993   0.9993   0.9991
## Pos Pred Value         0.9988   0.9956   0.9979   0.9924   0.9931   0.9887
## Neg Pred Value         1.0000   1.0000   0.9991   0.9993   0.9989   0.9996
## Prevalence             0.1694   0.1348   0.0976   0.0790   0.0868   0.0704
## Detection Rate         0.1694   0.1348   0.0968   0.0784   0.0858   0.0700
## Detection Prevalence   0.1696   0.1354   0.0970   0.0790   0.0864   0.0708
## Balanced Accuracy      0.9999   0.9997   0.9958   0.9959   0.9939   0.9967
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.9937   0.9957   0.9897   0.9938
## Specificity            0.9996   0.9993   0.9996   0.9991
## Pos Pred Value         0.9958   0.9935   0.9948   0.9918
## Neg Pred Value         0.9993   0.9996   0.9991   0.9993
## Prevalence             0.0956   0.0922   0.0774   0.0968
## Detection Rate         0.0950   0.0918   0.0766   0.0962
## Detection Prevalence   0.0954   0.0924   0.0770   0.0970
## Balanced Accuracy      0.9966   0.9975   0.9946   0.9965

table(label_predicted,digits_train[1:5000, 1])

##                
## label_predicted   0   1   2   3   4   5   6   7   8   9
##               0 847   0   0   0   0   0   1   0   0   0
##               1   0 674   1   0   1   0   1   0   0   0
##               2   0   0 484   0   0   1   0   0   0   0
##               3   0   0   1 392   0   0   0   0   1   1
##               4   0   0   2   0 429   0   0   1   0   0
##               5   0   0   0   1   0 350   1   0   2   0
##               6   0   0   0   0   1   1 475   0   0   0
##               7   0   0   0   0   0   0   0 459   1   2
##               8   0   0   0   2   0   0   0   0 383   0
##               9   0   0   0   0   3   0   0   1   0 481

###Out of time validation with test data
test_label_predicted<-predict(number.svm, newdata =digits_test[,-1] , type = "class")
confusionMatrix(test_label_predicted,digits_test[,1])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 351   0   3   0   0   3   5   0   3   0
##          1   0 253   0   0   1   0   0   0   0   0
##          2   6   2 182   6   5   4   4   3   4   0
##          3   0   0   4 144   0   3   0   0   4   0
##          4   1   5   4   0 185   1   2   5   0   4
##          5   0   0   0  11   2 145   1   0   5   1
##          6   0   3   1   0   3   0 158   0   1   0
##          7   0   0   1   1   1   0   0 137   0   1
##          8   1   0   3   3   0   1   0   0 146   2
##          9   0   1   0   1   3   3   0   2   3 169
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9317          
##                  95% CI : (0.9198, 0.9424)
##     No Information Rate : 0.1789          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9233          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.9777   0.9583  0.91919  0.86747  0.92500  0.90625
## Specificity            0.9915   0.9994  0.98121  0.99402  0.98783  0.98917
## Pos Pred Value         0.9616   0.9961  0.84259  0.92903  0.89372  0.87879
## Neg Pred Value         0.9951   0.9937  0.99107  0.98812  0.99167  0.99186
## Prevalence             0.1789   0.1315  0.09865  0.08271  0.09965  0.07972
## Detection Rate         0.1749   0.1261  0.09068  0.07175  0.09218  0.07225
## Detection Prevalence   0.1819   0.1266  0.10762  0.07723  0.10314  0.08221
## Balanced Accuracy      0.9846   0.9789  0.95020  0.93075  0.95641  0.94771
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity           0.92941  0.93197  0.87952  0.95480
## Specificity           0.99565  0.99785  0.99457  0.99290
## Pos Pred Value        0.95181  0.97163  0.93590  0.92857
## Neg Pred Value        0.99348  0.99464  0.98920  0.99562
## Prevalence            0.08470  0.07324  0.08271  0.08819
## Detection Rate        0.07872  0.06826  0.07275  0.08421
## Detection Prevalence  0.08271  0.07025  0.07773  0.09068
## Balanced Accuracy     0.96253  0.96491  0.93704  0.97385

#####Model on Full Data 
pc <- proc.time()
number.svm <- svm(V1 ~. , type="C", data = digits_train)
proc.time() - pc

##    user  system elapsed 
##   76.94    0.26   87.24

summary(number.svm)

## 
## Call:
## svm(formula = V1 ~ ., data = digits_train, type = "C")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.00390625 
## 
## Number of Support Vectors:  2606
## 
##  ( 213 326 319 235 285 63 256 262 401 246 )
## 
## 
## Number of Classes:  10 
## 
## Levels: 
##  0 1 2 3 4 5 6 7 8 9

#Confusion Matrix
library(caret)
label_predicted<-predict(number.svm, type = "class")
confusionMatrix(label_predicted,digits_train[,1])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1    2    3    4    5    6    7    8    9
##          0 1194    0    0    0    0    0    2    0    0    0
##          1    0 1005    1    1    2    0    1    0    1    0
##          2    0    0  724    0    0    1    0    0    0    0
##          3    0    0    2  651    0    0    0    0    0    1
##          4    0    0    4    0  648    1    0    2    1    1
##          5    0    0    0    3    0  553    0    0    2    0
##          6    0    0    0    0    0    1  661    0    0    0
##          7    0    0    0    0    0    0    0  641    2    3
##          8    0    0    0    3    0    0    0    0  536    0
##          9    0    0    0    0    2    0    0    2    0  639
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9947          
##                  95% CI : (0.9927, 0.9962)
##     No Information Rate : 0.1638          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.994           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            1.0000   1.0000  0.99042  0.98936  0.99387  0.99460
## Specificity            0.9997   0.9990  0.99985  0.99955  0.99864  0.99926
## Pos Pred Value         0.9983   0.9941  0.99862  0.99541  0.98630  0.99104
## Neg Pred Value         1.0000   1.0000  0.99893  0.99895  0.99940  0.99955
## Prevalence             0.1638   0.1378  0.10026  0.09025  0.08943  0.07626
## Detection Rate         0.1638   0.1378  0.09930  0.08929  0.08888  0.07585
## Detection Prevalence   0.1640   0.1387  0.09944  0.08970  0.09011  0.07653
## Balanced Accuracy      0.9998   0.9995  0.99514  0.99445  0.99625  0.99693
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity           0.99548  0.99380  0.98893  0.99224
## Specificity           0.99985  0.99925  0.99956  0.99940
## Pos Pred Value        0.99849  0.99226  0.99443  0.99378
## Neg Pred Value        0.99955  0.99940  0.99911  0.99925
## Prevalence            0.09107  0.08847  0.07434  0.08833
## Detection Rate        0.09066  0.08792  0.07352  0.08764
## Detection Prevalence  0.09080  0.08860  0.07393  0.08819
## Balanced Accuracy     0.99767  0.99652  0.99424  0.99582

###Out of time validation with test data
test_label_predicted<-predict(number.svm, newdata =digits_test[,-1] , type = "class")
confusionMatrix(test_label_predicted,digits_test[,1])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 351   0   2   0   0   3   4   0   4   0
##          1   0 253   0   0   1   0   0   0   0   0
##          2   6   1 183   5   3   2   4   2   2   0
##          3   0   0   4 146   0   3   0   0   3   0
##          4   1   5   3   0 186   1   2   5   0   4
##          5   0   1   0  11   1 147   1   0   2   1
##          6   0   3   1   0   2   0 158   0   1   0
##          7   0   1   1   1   3   0   0 138   0   0
##          8   1   0   4   3   1   1   1   0 151   2
##          9   0   0   0   0   3   3   0   2   3 170
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9382          
##                  95% CI : (0.9268, 0.9484)
##     No Information Rate : 0.1789          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9306          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.9777   0.9583  0.92424  0.87952  0.93000  0.91875
## Specificity            0.9921   0.9994  0.98618  0.99457  0.98838  0.99080
## Pos Pred Value         0.9643   0.9961  0.87981  0.93590  0.89855  0.89634
## Neg Pred Value         0.9951   0.9937  0.99166  0.98920  0.99222  0.99295
## Prevalence             0.1789   0.1315  0.09865  0.08271  0.09965  0.07972
## Detection Rate         0.1749   0.1261  0.09118  0.07275  0.09268  0.07324
## Detection Prevalence   0.1814   0.1266  0.10364  0.07773  0.10314  0.08171
## Balanced Accuracy      0.9849   0.9789  0.95521  0.93704  0.95919  0.95477
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity           0.92941  0.93878  0.90964  0.96045
## Specificity           0.99619  0.99677  0.99294  0.99399
## Pos Pred Value        0.95758  0.95833  0.92073  0.93923
## Neg Pred Value        0.99349  0.99517  0.99186  0.99617
## Prevalence            0.08470  0.07324  0.08271  0.08819
## Detection Rate        0.07872  0.06876  0.07524  0.08470
## Detection Prevalence  0.08221  0.07175  0.08171  0.09018
## Balanced Accuracy     0.96280  0.96777  0.95129  0.97722

#Lets see some predictions. 
digits_test$predicted<-test_label_predicted

for(i in 1:10 )
{
data_row<-digits_test[i,c(-1,-ncol(digits_test))]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , digits_test[i,1] ,"  Prediction is" , digits_test[i,ncol(digits_test)]))
}

#Lets see some errors in predictions images. 
# Wrong predictions
digits_test$predicted<-test_label_predicted
wrong_predictions<-digits_test[!(digits_test$predicted==digits_test$V1),]
nrow(wrong_predictions)

## [1] 124

for(i in 1:10 )
{
data_row<-wrong_predictions[i,c(-1,-ncol(wrong_predictions))]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , wrong_predictions[i,1] ,"  Prediction is" , wrong_predictions[i,ncol(wrong_predictions)]))
}

Conclusion

Many software tools are available for SVM implementation
SVMs are really good for text classification
SVMs are good at finding the best linear separator. The kernel trick makes SVMs non-linear learning algorithms
Choosing an appropriate kernel is the key for good SVM and choosing the right kernel function is not easy
We need to be patient while building SVMs on large datasets. They take a lot of time for training.

21st June 2017