12.4Homework#4_Deep Learning

这是我的Advanced data science and architecture的一次作业,也恰巧是CNN人脸识别项目的一部分。偷懒使用****的markdown来编辑,索性直接贴出来吧。代码没有贴出来。

Option B: Use Deep Learning for analysis of your project data.

Part A - Deep Learning model (40 points)

  • For this project, we applied Convolutional Neural Network to make machine recognition W’s face and distinguish from other people’s.
  • The data included contains 11,500 face pictures of W capturesd and processed using Python library Dlib and OpenCV. The noise pictures are mainly contributed by UMass Amherst and University of Science and Technology of China.
  • The method is Convoluntional Neural Network realized with Tensorflow-gpu which is developed by Google.
  • The best accuracy so far is beyond 96%.

For the adjustment below, if not specially notified, we use a basic model of conditions that: RELU for activation function, softmax_cross_entropy for loss function, 20 epochs, Adam for gradient estimation optimizer,random_normal to initialize the parameters. And the basic architecture is three pair of convolution(filter 3x3, stride is [1,1,1,1]) and pooling layers (max_pooling, batch is 2x2 and stride is [1,2,2,1]), followed by two full connection layers, among which the output of last layer is to classify yes and no(if W’s face or not)

Part B - Activation function (10 points)

This part is to show how activation affect the accuracy and training time (time to plateaus). This part contains a accuracy table and accuracy as shown below.

Activation Function Accuracy(%)
ReLU 96.20
ELU 95.05
TanH 52.00
Sigmoid 51.60
Softplus 51.40

And the plots.

12.4Homework#4_Deep Learning

Accuracy: From the plot and table above, it is found that RELU brings highest accuracy, 96.20%, but is similar with that of ELU activation Function. At the same time, TanH, Sigmoid, and Softplus function are not suitable for our work because the accuracy is similar with that of naive rule(50%).
Plateauing time: We can find RELU plateaus very fast.

Part C - Cost function (10 points)

For this part we change the loss function as follows.

Loss Function Accuracy(%)
softmax_cross_entropy 95.35
cosine_distance 48.00
hinge 93.65
sigmoid_cross_entropy 94.55
mean_squared_error 90.95

12.4Homework#4_Deep Learning
Accuracy:
From the table and plot above, we can conclude that:
- Except cosine distance, the other loss functions creates high accuracy, among which softmax_cross_entropy and sigmoid_cross_entropy creats highest accuracy under given condition.
Plateauing time:
From the plots above, we can find except cosine distance, the networks with the left loss functions all have the trend of plateauing. And sigmoid_cross_entropy uses relatively lee time to plateau.

Part D - Epochs (10 points)

For this part we have two main checkpoints. Here we pick epoch numbers as 1,3,5,10,20,50 (batch size is 200) and generates accuracy table and plot as below.

Number of epoch Accuracy(%)
1 85.00
3 93.00
5 91.80
10 92.80
20 93.00
50 96.76
100 96.12

12.4Homework#4_Deep Learning

Accuracy:
From the table and plot above, we can conclude that:
- More epochs brings higher accuracy
- At the early part, each epoch brings large improvement of accuracy, while in the latter part of training, each can bring relatively smaller improvement of accuracy.
Plateauing time:
From the exact data for each epoch, we apply early stopping. When can find that if we stop when accuracy improvement by one epoch is less than 0.1%, this network plateaus at epoch #31.

Part E - Gradient estimation (10 points)

For this part, we tried different method as gradient estimation optimizer, i.e. Adam, Momentum, .The accuracy table and plots are shown below.

Optimizer Accuracy(%)
Adadelta 59.00
Adam 94.35
GradientDescent 56.00
Adagrad 59.00
RMSProp 89.45
Momentum 49.00

12.4Homework#4_Deep Learning

Accuracy:
From the results above, we can find Adam(Adaptive Moment Estimation) and RMSprop can bring the highest accuracy, 0.9435 and 89.45 under learning rate of 0.01. While Momentum, Adagrad, GradientDescent and Adadelta can not leads to low accuracy.
Plateauing time:
Adam is fastest, and RMSprop is the second, while Momentum, Adagrad, GradientDescent and Adadelta plots are horizontal line.
12.4Homework#4_Deep Learning

Also, even for the Adam method, if the learning rate is a bit high, say 0.08 , it will also not plateau under given number epochs(20). And generally smaller learning rate get better accuracy, say both learning rates of 0.005 generate very high accuracy(98.35%).
Part F - Network Architecture (10 points)¶
On your Deep Learning model data
Change the network architecture. How does it effect the accuracy?
How does it effect how quickly the network plateaus?

Part F - Network Architecture (10 points)

We have three pair of layers of convolution and pooling. Here we keep the stride of filters is [1,1,1,1] and of pooling is [1,2,2,1]. To change the architecture of this network, here we change the size for the kernels/filters(1st plot) and the channels of full connection layers(2nd plot).

filter size(50 epoch) Accuracy(%)
architecture_777 86.10
architecture_555 90.96
architecture_357 95.28
architecture_355 92.54
architecture_333 97.72
architecture_222 95.18

12.4Homework#4_Deep Learning

full connection channel Accuracy(%)
256 90.30
512 96.80
1024 96.55
2048 94.30

12.4Homework#4_Deep Learning
Accuracy:
For this part, we can find filter size of 3x3 for each layer and 3x3,5x5,7x7 for the corresponding layer give best accuracy.
And under the condition of filter size is 3x3, full connection layer with 512 and 1024 give the highest accuracy.
Plateauing time:
This part is similar, filter size of 3x3 for each layer and 3x3,5x5,7x7 for the corresponding layer plateau fastest and under the condition of filter size is 3x3, full connection layer with 512 uses least time to plateau.

Part G - Network initialization (10 points)

For this part we have two main checkpoints. Here we pick initialization methods of zeros, random_uniform, random_gaussian/normal (and with different standard deviations). The below is the accuracy number and plateauing plots.

initialization Accuracy(%)
random_uniform 54.00
zeros 49.00
random_gamma 57.4
random_normal_0.01 95.7
random_normal_0.015 85.80
random_normal_0.008 92.85

12.4Homework#4_Deep Learning
Accuracy:
From the results above, we can find initialization of random_normal(Gaussian) brings the highest accuracy, over 90% given 20 epochs. While the left leads to relatively low accuracy given 20 epochs.
Plateauing time:
Among these method, Gaussian is the only method that can plateau with the given number of epochs, the left may plateaus if given more epochs. And we can find stand deviation around 0.01 is very good for this network.