神经网络训练_训练神经网络

神经网络训练_训练神经网络

神经网络训练

训练神经网络 (Training a Neural Network)



Advertisements
广告

We will now learn how to train a neural network. We will also learn back propagation algorithm and backward pass in Python Deep Learning.

现在,我们将学习如何训练神经网络。 我们还将在Python深度学习中学习反向传播算法和反向传递。

We have to find the optimal values of the weights of a neural network to get the desired output. To train a neural network, we use the iterative gradient descent method. We start initially with random initialization of the weights. After random initialization, we make predictions on some subset of the data with forward-propagation process, compute the corresponding cost function C, and update each weight w by an amount proportional to dC/dw, i.e., the derivative of the cost functions w.r.t. the weight. The proportionality constant is known as the learning rate.

我们必须找到神经网络权重的最佳值才能获得所需的输出。 为了训练神经网络,我们使用迭代梯度下降法。 我们首先从权重的随机初始化开始。 随机初始化后,我们使用前向传播过程对数据的某些子集进行预测,计算相应的成本函数C,并以与dC / dw成比例的量更新每个权重w,即,重量。 比例常数称为学习率。

The gradients can be calculated efficiently using the back-propagation algorithm. The key observation of backward propagation or backward prop is that because of the chain rule of differentiation, the gradient at each neuron in the neural network can be calculated using the gradient at the neurons, it has outgoing edges to. Hence, we calculate the gradients backwards, i.e., first calculate the gradients of the output layer, then the top-most hidden layer, followed by the preceding hidden layer, and so on, ending at the input layer.

使用反向传播算法可以有效地计算出梯度。 向后传播或向后支撑的主要观察结果是,由于微分的链式规则,可以使用神经元的出射边缘来计算神经网络中每个神经元处的梯度。 因此,我们向后计算梯度,即首先计算输出层的梯度,然后是最顶层的隐藏层,然后是前面的隐藏层,依此类推,直到输入层为止。

The back-propagation algorithm is implemented mostly using the idea of a computational graph, where each neuron is expanded to many nodes in the computational graph and performs a simple mathematical operation like addition, multiplication. The computational graph does not have any weights on the edges; all weights are assigned to the nodes, so the weights become their own nodes. The backward propagation algorithm is then run on the computational graph. Once the calculation is complete, only the gradients of the weight nodes are required for update. The rest of the gradients can be discarded.

反向传播算法主要是使用计算图的思想来实现的,其中每个神经元都扩展到计算图中的许多节点,并执行简单的数学运算,例如加法,乘法。 计算图的边缘没有任何权重。 所有权重均分配给节点,因此权重成为其自己的节点。 然后在计算图上运行反向传播算法。 一旦计算完成,仅需要权重节点的梯度即可更新。 其余的梯度可以丢弃。

梯度下降优化技术 (Gradient Descent Optimization Technique)

One commonly used optimization function that adjusts weights according to the error they caused is called the “gradient descent.”

一种根据重量引起的误差调整权重的常用优化功能称为“梯度下降”。

Gradient is another name for slope, and slope, on an x-y graph, represents how two variables are related to each other: the rise over the run, the change in distance over the change in time, etc. In this case, the slope is the ratio between the network’s error and a single weight; i.e., how does the error change as the weight is varied.

梯度是坡度的另一个名称,坡度在xy图上表示两个变量如何相互关联:行程的上升,距离随时间的变化等。在这种情况下,坡度为网络错误与单个权重之间的比率; 也就是说,误差随着重量的变化如何变化。

To put it more precisely, we want to find which weight produces the least error. We want to find the weight that correctly represents the signals contained in the input data, and translates them to a correct classification.

为了更准确地说,我们想找出产生最小误差的权重。 我们想要找到正确表示输入数据中包含的信号的权重,并将其转换为正确的分类。

As a neural network learns, it slowly adjusts many weights so that they can map signal to meaning correctly. The ratio between network Error and each of those weights is a derivative, dE/dw that calculates the extent to which a slight change in a weight causes a slight change in the error.

随着神经网络的学习,它会缓慢调整许多权重,以便它们可以将信号正确映射到含义。 网络错误与这些权重中的每个权重之间的比率是导数dE / dw,它计算权重的轻微变化导致误差的轻微变化的程度。

Each weight is just one factor in a deep network that involves many transforms; the signal of the weight passes through activations and sums over several layers, so we use the chain rule of calculus to work back through the network activations and outputs.This leads us to the weight in question, and its relationship to overall error.

在涉及许多转换的深度网络中,每个权重只是一个因素。 权重的信号通过**并在多个层上求和,因此我们使用演算的链式规则通过网络**和输出进行反算,这导致了我们所讨论的权重及其与整体误差的关系。

Given two variables, error and weight, are mediated by a third variable, activation, through which the weight is passed. We can calculate how a change in weight affects a change in error by first calculating how a change in activation affects a change in Error, and how a change in weight affects a change in activation.

给定两个变量(错误和权重),由传递权重的第三个变量( **)介导。 我们可以通过首先计算**变化如何影响误差变化以及重量变化如何影响**变化来计算重量变化如何影响误差变化。

The basic idea in deep learning is nothing more than that: adjusting a model’s weights in response to the error it produces, until you cannot reduce the error any more.

深度学习的基本思想不过是:根据模型产生的误差来调整模型的权重,直到无法再减小误差为止。

The deep net trains slowly if the gradient value is small and fast if the value is high. Any inaccuracies in training leads to inaccurate outputs. The process of training the nets from the output back to the input is called back propagation or back prop. We know that forward propagation starts with the input and works forward. Back prop does the reverse/opposite calculating the gradient from right to left.

如果梯度值较小,则深网缓慢训练;如果梯度值较高,则深网训练很快。 培训中的任何不正确都会导致输出不准确。 从输出到输入的训练网络的过程称为反向传播或反向支撑。 我们知道前向传播从输入开始并向前进行。 反向Struts进行反向/相对计算,计算从右到左的梯度。

Each time we calculate a gradient, we use all the previous gradients up to that point.

每次我们计算梯度时,都会使用到该点之前的所有梯度。

Let us start at a node in the output layer. The edge uses the gradient at that node. As we go back into the hidden layers, it gets more complex. The product of two numbers between 0 and 1 gives youa smaller number. The gradient value keeps getting smaller and as a result back prop takes a lot of time to train and accuracy suffers.

让我们从输出层的一个节点开始。 边缘在该节点处使用渐变。 当我们回到隐藏层时,它变得更加复杂。 0到1之间的两个数字的乘积为您提供较小的数字。 梯度值会越来越小,结果是反向支撑要花费很多时间来训练,并且精度会受到影响。

深度学习算法的挑战 (Challenges in Deep Learning Algorithms)

There are certain challenges for both shallow neural networks and deep neural networks, like overfitting and computation time. DNNs are affected by overfitting because the use of added layers of abstraction which allow them to model rare dependencies in the training data.

浅层神经网络和深层神经网络都面临某些挑战,例如过度拟合和计算时间。 DNN受过度拟合的影响,因为使用了添加的抽象层,这使它们可以对训练数据中的稀有依赖性进行建模。

Regularization methods such as drop out, early stopping, data augmentation, transfer learning are applied during training to combat overfitting. Drop out regularization randomly omits units from the hidden layers during training which helps in avoiding rare dependencies. DNNs take into consideration several training parameters such as the size, i.e., the number of layers and the number of units per layer, the learning rate and initial weights. Finding optimal parameters is not always practical due to the high cost in time and computational resources. Several hacks such as batching can speed up computation. The large processing power of GPUs has significantly helped the training process, as the matrix and vector computations required are well-executed on the GPUs.

在训练过程中采用了正规化方法,例如辍学,提早停止,数据扩充,转移学习,以对抗过度拟合。 在训练过程中,放弃正规化会从隐藏层中随机删除单位,这有助于避免罕见的依赖关系。 DNN考虑了几个训练参数,例如大小,即层数和每层单元数,学习率和初始权重。 由于时间和计算资源的高昂成本,寻找最佳参数并不总是可行的。 批处理之类的一些技巧可以加快计算速度。 GPU的强大处理能力极大地帮助了训练过程,因为所需的矩阵和矢量计算可以在GPU上很好地执行。

退出 (Dropout)

Dropout is a popular regularization technique for neural networks. Deep neural networks are particularly prone to overfitting.

辍学是一种流行的神经网络正则化技术。 深度神经网络特别容易过度拟合。

Let us now see what dropout is and how it works.

现在让我们看看什么是辍学以及它是如何工作的。

In the words of Geoffrey Hinton, one of the pioneers of Deep Learning, ‘If you have a deep neural net and it's not overfitting, you should probably be using a bigger one and using dropout’.

用深度学习的先驱之一杰弗里·欣顿(Geoffrey Hinton)的话来说,“如果您拥有一个深层的神经网络并且没有过度拟合,那么您可能应该使用更大的网络并使用辍学功能”。

Dropout is a technique where during each iteration of gradient descent, we drop a set of randomly selected nodes. This means that we ignore some nodes randomly as if they do not exist.

辍学是一项技术,在该技术中,每次梯度下降迭代期间,我们都会丢弃一组随机选择的节点。 这意味着我们随机忽略某些节点,就好像它们不存在一样。

Each neuron is kept with a probability of q and dropped randomly with probability 1-q. The value q may be different for each layer in the neural network. A value of 0.5 for the hidden layers, and 0 for input layer works well on a wide range of tasks.

每个神经元以q的概率保留,并以1-q的概率随机丢弃。 对于神经网络中的每个层,值q可能不同。 对于广泛的任务,隐藏层的值为0.5,输入层的值为0。

During evaluation and prediction, no dropout is used. The output of each neuron is multiplied by q so that the input to the next layer has the same expected value.

在评估和预测期间,不使用任何辍学。 每个神经元的输出乘以q,以便到下一层的输入具有相同的期望值。

The idea behind Dropout is as follows − In a neural network without dropout regularization, neurons develop co-dependency amongst each other that leads to overfitting.

Dropout背后的想法如下-在没有Dropout正则化的神经网络中,神经元之间会形成相互依赖关系,从而导致过度拟合。

实施技巧 (Implementation trick)

Dropout is implemented in libraries such as TensorFlow and Pytorch by keeping the output of the randomly selected neurons as 0. That is, though the neuron exists, its output is overwritten as 0.

通过将随机选择的神经元的输出保持为0,可在TensorFlow和Pytorch之类的库中实现Dropout。也就是说,尽管存在神经元,但其输出将被覆盖为0。

提前停止 (Early Stopping)

We train neural networks using an iterative algorithm called gradient descent.

我们使用称为梯度下降的迭代算法训练神经网络。

The idea behind early stopping is intuitive; we stop training when the error starts to increase. Here, by error, we mean the error measured on validation data, which is the part of training data used for tuning hyper-parameters. In this case, the hyper-parameter is the stop criteria.

尽早停止的想法很直观。 当错误开始增加时,我们将停止训练。 在这里,误差是指在验证数据上测得的误差,该数据是用于调整超参数的训练数据的一部分。 在这种情况下,超参数是停止条件。

数据扩充 (Data Augmentation)

The process where we increase the quantum of data we have or augment it by using existing data and applying some transformations on it. The exact transformations used depend on the task we intend to achieve. Moreover, the transformations that help the neural net depend on its architecture.

我们使用现有数据并对其进行一些转换来增加或增加所拥有数据量的过程。 使用的确切转换取决于我们要完成的任务。 此外,帮助神经网络的转换取决于其体系结构。

For instance, in many computer vision tasks such as object classification, an effective data augmentation technique is adding new data points that are cropped or translated versions of original data.

例如,在诸如对象分类之类的许多计算机视觉任务中,有效的数据增强技术正在添加新数据点,这些新数据点是原始数据的裁剪或翻译版本。

When a computer accepts an image as an input, it takes in an array of pixel values. Let us say that the whole image is shifted left by 15 pixels. We apply many different shifts in different directions, resulting in an augmented dataset many times the size of the original dataset.

当计算机接受图像作为输入时,它将接受一个像素值数组。 我们说整个图像向左移动15个像素。 我们在不同方向上应用了许多不同的移位,从而形成了一个扩展数据集,其大小是原始数据集的许多倍。

转移学习 (Transfer Learning)

The process of taking a pre-trained model and “fine-tuning” the model with our own dataset is called transfer learning. There are several ways to do this.A few ways are described below −

采取预先训练的模型并使用我们自己的数据集对模型进行“微调”的过程称为转移学习。 有几种方法可以做到这一点,下面介绍几种方法-

  • We train the pre-trained model on a large dataset. Then, we remove the last layer of the network and replace it with a new layer with random weights.

    我们在大型数据集上训练预训练模型。 然后,我们删除网络的最后一层,并用具有随机权重的新层替换它。

  • We then freeze the weights of all the other layers and train the network normally. Here freezing the layers is not changing the weights during gradient descent or optimization.

    然后,我们冻结所有其他层的权重并正常训练网络。 在这里,冻结层不会在梯度下降或优化过程中改变权重。

The concept behind this is that the pre-trained model will act as a feature extractor, and only the last layer will be trained on the current task.

其背后的概念是预训练的模型将充当特征提取器,并且仅最后一层将在当前任务上进行训练。

Advertisements
广告

翻译自: https://www.tutorialspoint.com/python_deep_learning/python_deep_learning_training_a_neural_network.htm

神经网络训练