Batch Gradient Descent vs Mini-Batch Gradient Descent vs Stochastic gradient descent
分类:
文章
•
2023-03-28 20:30:41
Batch Gradient Descent vs Mini-Batch Gradient Descent vs Stochastic Gradient Descent
Batch Gradient Descent
- Each step of gradient descent uses all the training examples.
- Advantage: Achieve global optimum after enough iteration.
- Disadvantage: Large data set. Computationally expensive, or even fail to complete.
Stochastic Gradient Descent
- Each step uses one training example.
- Learning rate α is typically held constant. Can slowly decrease α over time is we want θ to converge
- Advantage: Robot for large data set.
- Disadvantage: Unstable. Move “around” to the optimum, not go straight to the optimum(Batch).
-
NOTE: Shuffling is really important. To avoid ending up at local optimum.
Mini-batch Gradient Descent
- Combine Batch with Stochastic: Use b examples in each iteration. Batch size
- More smoothly, compared to Stochastic.
- Additional parameter: batch_size