Type I and Type II errors

  • Type I error, also known as a “false positive”: the error of rejecting a null  hypothesis when it is actually true. 

 So the probability of making a type I error in a test with  rejection region R is

  • Type II error, also known as a "false negative": the error of not rejecting a null  hypothesis when the alternative hypothesis is the true state of nature.  So the probability of making a  type II error in a test with rejection region R is 
  • The power of  the test can be 


Hypothesis testing is the art of testing if variation between two sample distributions can  just be explained through random chance or not.

  •  If we have to conclude that two  distributions vary in a meaningful way, we must take enough precaution to see that the  differences are not just through random chance. 
  • At the heart of Type I error is that we  don't want to make an unwarranted hypothesis so we exercise a lot of care by minimizing the chance of its occurrence. 
  • Traditionally we try to set Type I error as .05 or .01 - as in  there is only a 5 or 1 in 100 chance that the variation that we are seeing is due to chance. 
  • This is called the 'level of significance'. Again, there is no guarantee that 5 in 100 is rare  enough so significance levels need to be chosen carefully. 
  • Type I error is generally reported as the p-value. 

Statistics derives its power from random sampling. The argument is that random sampling will average out the differences between two populations and the differences between the populations seen post "treatment" could be easily traceable as a result of the treatment only. 

Obviously, life isn't as simple. There is little chance that one will pick  random samples that result in significantly same populations. Even if they are the same populations, we can't be sure whether the results that we are seeing are just one time (or rare) events or actually significant (regularly occurring) events. 


Multiple Hypothesis Testing 

In Statistics, multiple testing refers to the potential increase in Type I error that occurs  when statistical tests are used repeatedly, for example while doing multiple comparisons  to test null hypotheses stating that the averages of several disjoint populations are equal  to each other (homogeneous). 



False Discovery Rate

For large-scale multiple testing (for example, as is very common in genomics when using  technologies such as DNA microarrays) one can instead control the false discovery rate  (FDR), defined to be the expected proportion of false positives among all significant tests. 


False discovery rate (FDR) controls the expected proportion of incorrectly rejected null  hypotheses (type I errors) in a list of rejected hypotheses. 

It is a less conservative  comparison procedure with greater power than familywise error rate (FWER) control, at  a cost of increasing the likelihood of obtaining type I errors. 

(Bonferroni correction  controls FWER; FWER = P(the number of type I errors ≥ 1)).

The q-value is defined to be the FDR analogue of the p-value. The q-value of an  individual hypothesis test is the minimum FDR at which the test may be called significant. 


Type I and Type II errors

Type I and Type II errors