机器学习 | Week6-Advice_and_System_Design-建议与系统设计：学习-吴恩达AndrewNg

作为英语课程，读中文参考资料的确有助于理解，但是出于对以后更长久的学习优势考虑，笔记中我会尽量采用英文来表述，这样有助于熟悉专有名词以及常见语法结构，对于无中文翻译的资料阅读大有裨益。

但是这大量都会浪费你六个月时间而无所帮助，接下来介绍“Machine Learning Diagnostic [.daɪəɡ’nɑstɪk]”来解决这个问题

评估Hypothesis：取出少量数据按照7：3=训练集（随机选择）：测试集
误差此处叫误分类率Multiclassification error，也成为0/1错分率
训练集60、交叉验证集20（Cross Vadidation）、测试集20
- 选择CV最小的哪一种hypothesis，这样测试的时候，测试集就与hypothesis毫无关系

decision
- Getting more training examples: Fixes high variance
- Trying smaller sets of features: Fixes high variance
- Adding features: Fixes high bias
- Adding polynomial features: Fixes high bias
- Decreasing λ: Fixes high bias
- Increasing λ: Fixes high variance.
- Model Complexity Effects:Fixes high variance.
- Lower-order polynomials (low model complexity) have high bias and low variance. In this case, the model fits poorly consistently.
- Higher-order polynomials (high model complexity) fit the training data extremely well and the test data extremely poorly. These have low bias on the training data, but very high variance.
- In reality, we would want to choose a model somewhere in between, that can generalize well but also fits the data reasonably well.

事例偏差极大，分类事例，患癌和无癌差别达到0.05/99.5
- 此时如果预测y=0，错误率仅有0.05，而神经网络大多99%准确率，此时用error analysis就不那么有用

查准率Precision / 召回率，查全率Recall
- 查准率=$\frac{\text{True pos} }{\text{all predict pos} }$ - 召回率=$\frac{\text{True pos} }{\text{all real pos} }$