Type I and Type II errors
- Type I error, also known as a “false positive”: the error of rejecting a null hypothesis when it is actually true.
So the probability of making a type I error in a test with rejection region R is
- Type II error, also known as a "false negative": the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature. So the probability of making a type II error in a test with rejection region R is
- The power of the test can be
Hypothesis testing is the art of testing if variation between two sample distributions can just be explained through random chance or not.
- If we have to conclude that two distributions vary in a meaningful way, we must take enough precaution to see that the differences are not just through random chance.
- At the heart of Type I error is that we don't want to make an unwarranted hypothesis so we exercise a lot of care by minimizing the chance of its occurrence.
- Traditionally we try to set Type I error as .05 or .01 - as in there is only a 5 or 1 in 100 chance that the variation that we are seeing is due to chance.
- This is called the 'level of significance'. Again, there is no guarantee that 5 in 100 is rare enough so significance levels need to be chosen carefully.
- Type I error is generally reported as the p-value.
Statistics derives its power from random sampling. The argument is that random sampling will average out the differences between two populations and the differences between the populations seen post "treatment" could be easily traceable as a result of the treatment only.
Obviously, life isn't as simple. There is little chance that one will pick random samples that result in significantly same populations. Even if they are the same populations, we can't be sure whether the results that we are seeing are just one time (or rare) events or actually significant (regularly occurring) events.
Multiple Hypothesis Testing
In Statistics, multiple testing refers to the potential increase in Type I error that occurs when statistical tests are used repeatedly, for example while doing multiple comparisons to test null hypotheses stating that the averages of several disjoint populations are equal to each other (homogeneous).
False Discovery Rate
For large-scale multiple testing (for example, as is very common in genomics when using technologies such as DNA microarrays) one can instead control the false discovery rate (FDR), defined to be the expected proportion of false positives among all significant tests.
False discovery rate (FDR) controls the expected proportion of incorrectly rejected null hypotheses (type I errors) in a list of rejected hypotheses.
It is a less conservative comparison procedure with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors.
(Bonferroni correction controls FWER; FWER = P(the number of type I errors ≥ 1)).
The q-value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant.