Lecture 6 Training Neural Network (1)

先附上一个关于 epoch, iteration的疑惑的帖子，这个人写的很好，他的其他关于深度学习的文章也值得看一看。关于iteration，epoch，batchsize的理解

在看程序中遇到的关于 argparse 的疑惑。
argparse 使用

神经元

each neuron performs a dot product with the input and its weights, adds the bias and applies the non-linearity (or activation function)。换句通俗易懂的话就是说，每一个神经元做的事情实质上是用输入的像素和权重做了点积，之后加上了bias。得到了一个类似于scores的东西，然后将这个scores通过一个非线性计算（**方程）。

A single neuron can be used to implement a binary classifier (e.g. binary Softmax or binary SVM classifiers)

**方程

下面要介绍几个**方程各自的优缺点。
先占坑，有空来补上，最近忙着做毕设，焦头烂额。

Neural Network architectures

Layer-wise organization

Lecture 6 Training Neural Network (1)

命名习惯：一般我们说的 N-layer 神经网络，实际上是不包括 input layer 的。所以平时说的 single layer 指得就是没有隐藏层，从输入层直接映射到输出层。
输出层：输出层后面是没有**方程的，因为最后一层代表的是 class scores。

feed-forward computation

The forward pass of a fully-connected layer corresponds to one matrix multiplication followed by a bias offset and an activation function. 翻译过来就是正向计算时要做的就是：矩阵相乘后加上bias，然后结果通过一个**方程。

在传统的神经网络中，三层网络的效果要比二层的好很多，但是再深一点，比如四，五，六层，却没有太大的改善。这个和卷积神经网络形成了鲜明的对比，对于识别系统来说，深度在CNN里是一个非常重要的成份。这种观察的论据是，图像是分级结构的，比如说人脸有眼睛组成，眼睛又由边缘线条等组成，因此多层处理对于此数据域来说是很直观的。

Setting number of layers and their sizes

直接上结论：The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting.
翻译一下意思就是说：不要使用小的网络。你应该使用尽量大的网络，并且使用一些正则化方法（such as L2 regularization, dropout, input noise，higher weight decay）来避免过度拟合。

Lecture 6 Training Neural Network (1)

神经元

**方程

Neural Network architectures

Layer-wise organization

feed-forward computation

Setting number of layers and their sizes

相关推荐