
7. Parameter Estimation

  • Model and parameters
  • Properties of good estimators
    • Unbiasedness, consistency
    • UMVUE, efficiency
  • MLE
  • Bayesian Estimation
    • why?
    • Prior and Posterior
    • Conjugate distribution
    • Limitations

Reason: statistic estimation is not general estimation problem.

  • Formulation:
    X1,X2,...,Xn i.i.df(x;θ)   θunknownEstimator:ϕ^=ϕ(X),ϕ:RnE X_1, X_2,..., X_n \ i.i.d \sim f(x ; \theta) \ \ \ \theta \in unknown\\ Estimator: \hat \phi = \phi(X) , \phi: \mathbb{R}^{n} \rightarrow E

Properties of Good Estimators:

  • Unbiasedness: 样本量抽样分布的数学期望等于被估计总体的参数
    E[ϕ(X)]=θ for Xf(x;θ) E[\phi(X)]=\theta \text { for } X \sim f(x ; \theta)
  • Consistency: 随样本量增大,估计量收敛于总体的被估计参数
    ϕ(X)θ in probability for Xf(x;θ) \phi(X) \rightarrow \theta \text { in probability for } X \sim f(x ; \theta)
  • Example:
    s2=1n1i=1n(XiX)2σ^2=1ni=1n(XiX)2 \begin{aligned} s^{2} &=\frac{1}{n-1} \sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2} 无偏的\\ \hat{\sigma}^{2} &=\frac{1}{n} \sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2} 一致的\end{aligned}
  • Accurate:
  • Efficient:
  • UMVUE is very restrictive. Efficient is weaker condition.

Maximum Likelihood Estimation

MLE is a framework to design consistent and efficient estimator under very general conditions.


  • The likelihood function:
    L(X;θ)=i=1nf(Xi;θ)X  i.i.df(x;θ)   θunknown L(X ; \theta)=\prod_{i=1}^{n} f\left(X_{i} ; \theta\right) \\ X ~ \ i.i.d \sim f(x ; \theta) \ \ \ \theta \in unknown\\
  • MLE: For given data samples X=x
    θ^=argmaxθEL(x;θ)=L(x;θ^) \hat\theta=argmax _{\theta \in E} L(x ; \theta)=L(x ; \hat{\theta})


  • To solve MLE, even numerically, could be very challenging.
  • MLE does not guarantee good performance in finite sample.

Bayesian Estimation

With Bayesian estimation, we can easily update our estimator in a fashion that samples are collected sequentially.


  • θ ~ E
  • f0(θ)f_{0}(\theta) as the prior of θ\theta
  • f1(θ)f_{1}(\theta) called posterior, which gives the distribution of θ\theta on condition data
    f1(θ)=f(θX)=L(x;θ)f0(θ)EL(x;u)f0(u)du f_{1}(\theta)= f(\theta|X)=\frac{L(x ; \theta) f_{0}(\theta)}{\int_{E} L(x ; u) f_{0}(u) d u}

Sequential Bayesian Estimation
Intuitively, if more data Xn+1,…,Xn+m is available, we can take the previous posterior f1 as the new prior and update the belief again using the new data only:

f2(θ)=L(x;θ)f1(θ)EL(x;u)f1(u)du f_{2}(\theta)=\frac{L(x ; \theta) f_{1}(\theta)}{\int_{E} L(x ; u) f_{1}(u) d u}


  • Its dependence on the prior, which can be any distribution on E. A very strong prior could lead to a non-consistent estimation.
    • In the information-based trade example, what will happen if we pick p0 = 1?
      On the other hand, a weak prior could lead to slow convergence.
  • The computation of the posterior could be very costly when the parameter space E is large.

8. Confidence Interval

  • Three constructions of CI for i.i.d samples:
    • normal
    • t
    • bootstrap
  • When and how?

Central Limit Theory

  • Theorem: {Xi}\{X_i\} is a sequence of i.i.d. samples of X with E[X]=μE[X] = μ and
    Var(X)=σ2Var(X) = σ^2. Then,
    nσ(Xnμ)N(0,1) \frac{\sqrt{n}}{\sigma}\left(\overline{X}_{n}-\mu\right) \Rightarrow N(0,1)
  • Therefore, when n is “large”, for any α > 0
    P(nσ(Xnμ)>a)P(Z>a) P\left(\left|\frac{\sqrt{n}}{\sigma}\left(\overline{X}_{n}-\mu\right)\right|>a\right) \approx P(|Z|>a)
    where Z is a standard normal r.v.

Confidence Interval(z-distribution)

  • For any confidence level aa, we simply choose ϕ\phi such that
    P(Z>ϕ)=1aP(|Z|>\phi)=1-a, then the a confidence interval is
    [Xnϕσn,Xn+ϕσn] \left[\overline{X}_{n}-\phi \frac{\sigma}{\sqrt{n}}, \overline{X}_{n}+\phi \frac{\sigma}{\sqrt{n}}\right]
  • 95% CI means that: 如果做了100次抽样,大概有95次找到的区间包含真值,有5次找到的区间不包含真值。
    s.e.σ.x=σ/n 样本均值的标准误差s.e.为\sigma_{ . \overline{x}}=\sigma / \sqrt{n}

The Effect of Sample Size

  • The magnitude of estimation error, measured by the half length of CI, is
    ϕσn \phi \frac{\sigma}{\sqrt{n}}
  • In order to have the estimation error ≈ ε, we need the sample size
    nϕ2σ2ε2 n \approx \frac{\phi^{2} \sigma^{2}}{\varepsilon^{2}}
    Intuitively, to improve the estimation accuracy by 10 times, we need enlarge the sample size by 100 times.

CI for Small Samples

  • Theorem: (CI of t-distribution)
    If X1,X2,...,XnX1, X2,...,Xn are i.i.d. samples of a normal distribution N(μ,σ2)N(μ,σ^2), then
    ns(Xnμ)t(n1)\frac{\sqrt{n}}{s}\left(\overline{X}_{n}-\mu\right) \sim t(n-1), a t-distribution with degree of freedom n − 1.
  • Remark:
    • t-distribution is more disperse than normal.
    • When n → ∞, t(n − 1) ⇒ N (0, 1).



9. Significance Test

  • Formulation of general hypothesis test
    • Parameter space
    • Hypothesis / Alternative
    • Hypothesis testing
  • Significance test
    • 5 steps
    • What is the intuition
    • How to choose the hypothesis and alternative
    • How to interpret the p-value
    • Type I and II errors

Steps of a Significance Test

  1. Assumptions: underlying probability model for population
  2. Hypothesis: Formulate the statement or prediction in your research problem into a statement about the population parameter.
  3. Test Statistic: the test statistic measures how “far” the point estimate of parameter is from its null hypothesis value(s), conditional on that null hypothesis is true.
  4. P-Value: the tail probability beyond the observed value of test statistic, if we presume null hypothesis is true. 事件发生的不可能程度
  5. Conclusion: Report and interpret the p-value in the context of the study. Make a decision about H0 based on p-value.

Type I & Type II errors & Interpreting P-Value


Inference on Single Variables

Population proportion
  • z-test
  • Difference from CI
  • Small sample: binomial test


Population mean
  • t-test
  • Relation with CI
  • Small sample: bootstrap

Inference on Two Variables

  • Independent samples
    • Population proportion: z-test
    • Population mean: t-test
    • Small sample: permutation test
  • Paired data: t_test for single variable
  • standard error of z:
    z=(p1p2)(π1π2)p1(1p1)n1+p2(1p2)n2 z=\frac{\left(p_{1}-p_{2}\right)-\left(\pi_{1}-\pi_{2}\right)}{\sqrt{\frac{p_{1}\left(1-p_{1}\right)}{n_{1}}+\frac{p_{2}\left(1-p_{2}\right)}{n_{2}}}}

  • standard error of u:

  • Conclude CI:
    Given our estimation on the standard error for the estimated mean or proportion difference, we can construct the confidence interval for mean or proportion difference:
    [(xy)ϕαse,(xy)+ϕαse] \left[(\overline{x}-\overline{y})-\phi_{\alpha} s e,(\overline{x}-\overline{y})+\phi_{\alpha} s e\right]
    The coefficient φα is determined by α and model assumptions (normal
    distribution for proportions, t distribution for means).

Permutation Test



Paired data


10. Multiple Regression

  • Assumptions
  • Interpretation of estimation results
  • Inference methods:
    • t-test for single coefficient
    • F-test for nested models
  • Residual analysis

Assumptions(linear regression model)

yi=β0+k=1pβkgk(xik)+εi y_{i}=\beta_{0}+\sum_{k=1}^{p} \beta_{k} g_{k}\left(x_{i k}\right)+\varepsilon_{i}

where the functions gkg_k are known. Besides, we assume the following conditions on εiε_i:

  • Independence: εiε_i are independent.
  • Zero mean: E[εx]=0E[ε|x] = 0 for all possible value of x=(x1,...,xm)x = (x1, ..., xm).
  • Equal variance: Var(εx)=σ2Var(ε|x) = σ2.
  • Normality: εiε_i are normal conditional on x.

T-test & F-test


Residual analysis

  • DW-test 检验是否独立,原假设是残差独立不相关
  • JB-test 检验是否正太分布,原假设是残差是正太分布

Assumptions(logistic regression)

#compute TPR and FPR for different threshold
for (i in 1:n)
  tpr[i]=tp/(tp+tn)  #true positive rate
  fpr[i]=fp/(fp+fn)  #false positive rate
# plot ROC
plot(fpr,tpr,type='l',ylim = c(0,1),xlim = c(0,1),main = 'ROC')