Babysitting the Learning Process
在预处理完data和选择合适的network architecture后就开始train了,
1. Double check that the loss is reasonable:最开始disable regularization的话loss应该是与1/# class相关的一个数,
2。 Make sure that you can overfit very small portion of the training data
3、Start with small regularization and find learning rate that makes the loss go down.
cost: NaN almost always means high learning rate...
lr是最要的超参数,一般采用coarse -> fine cross-validation in stages
4、训练过程中Monitor and visualize the loss curve & accuracy