机器学习基石 - Hazard of Overfitting
机器学习基石下 (Machine Learning Foundations)—Mathematical Foundations
Hsuan-Tien Lin, 林轩田,副教授 (Associate Professor),资讯工程学系 (Computer Science and Information Engineering)
What is Overfitting?
bad generalization: low , high
-
example
-
Cause of Overfitting
- excessive
- noise
- limited data size
The Role of Noise and Data Size
concession for advantage
-
Learning Curves Revisited
‘target complexity’ acts like noise
Deterministic Noise
-
A Detailed Experiment
-
The Results
- impact of versus N: stochastic noise
- impact of versus N: deterministic noise
-
four reasons of serious overfitting
overfitting ‘easily’ happens
-
Deterministic Noise
pseudo-random generator 伪随机数发生器
Dealing with Overfitting
Driving Analogy Revisited
- correct the label (data cleaning)
- remove the example (data pruning)
- add virtual examples by shifting/rotating the given digits (data hinting)
possibly helps, but effect varies (改变数据的分布)