[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

summary

2.1 mini-batch gradient

the size of batch: m BGD: too long for each iteration

the size of batch:1 SGD: lose speed up(vectorization)

in-between mini-batch


2.2 bias correct

For the beginning of the curve, we use bias correct.( the curve doesn't fit with the former data)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

2.3 exponentially weighted averages

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

2.4 gd with momentum

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

2.5 RMSprop

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

2.6 Adam

momentum+RMSprop+bias correct

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)


2.7 learning rate decay

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)


question:

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)

[coursera/ImprovingDL/week2]Optimization algorithms(summary&question)