Babysitting the Learning Process

在预处理完data和选择合适的network architecture后就开始train了，

1. Double check that the loss is reasonable：最开始disable regularization的话loss应该是与1/# class相关的一个数，

2。 Make sure that you can overfit very small portion of the training data

3、Start with small regularization and find learning rate that makes the loss go down.

cost: NaN almost always means high learning rate...

lr是最要的超参数，一般采用coarse -> fine cross-validation in stages

Babysitting the Learning Process

4、训练过程中Monitor and visualize the loss curve & accuracy

Babysitting the Learning Process