[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)
summary
2.1 mini-batch gradient
the size of batch: m BGD: too long for each iteration
the size of batch:1 SGD: lose speed up(vectorization)
in-between mini-batch
2.2 bias correct
For the beginning of the curve, we use bias correct.( the curve doesn't fit with the former data)
2.3 exponentially weighted averages
2.4 gd with momentum
2.5 RMSprop
2.6 Adam
momentum+RMSprop+bias correct
2.7 learning rate decay
question: