您的位置: 首页 > 文章 > Batch Gradient Descent vs Mini-Batch Gradient Descent vs Stochastic gradient descent

Batch Gradient Descent vs Mini-Batch Gradient Descent vs Stochastic gradient descent

分类: 文章 • 2023-03-28 20:30:41

Batch Gradient Descent vs Mini-Batch Gradient Descent vs Stochastic Gradient Descent

Batch Gradient Descent
Stochastic Gradient Descent
Mini-batch Gradient Descent

Batch Gradient Descent

Each step of gradient descent uses all the training examples.
Advantage: Achieve global optimum after enough iteration.
Disadvantage: Large data set. Computationally expensive, or even fail to complete.

Stochastic Gradient Descent

Each step uses one training example.
Learning rate $\alpha$ is typically held constant. Can slowly decrease $\alpha$ over time is we want $\theta$ to converge
Advantage: Robot for large data set.
Disadvantage: Unstable. Move “around” to the optimum, not go straight to the optimum(Batch).
NOTE: Shuffling is really important. To avoid ending up at local optimum.

Mini-batch Gradient Descent

Combine Batch with Stochastic: Use b examples in each iteration. Batch size
More smoothly, compared to Stochastic.
Additional parameter: batch_size