【概率论】条件分布与独立性(上)

条件分布律 一般地,有如下定义.设(X,Y)为离散型二维随机变量,其分布律为pij=P{X=xi,Y=yj},i,j=1,2,p_{ij}=P\left\{X=x_{i}, Y=y_{j}\right\},i,j=1,2, \cdots
若对于固定的xix_{i},P{X=xi}>0,P\left\{X=x_{i}\right\}>0, 则称条件概率P{Y=yjX=xi}=P{X=xi,Y=yj}P{X=xi}=pijpiP\left\{Y=y_{j} | X=x_{i}\right\}=\frac{P\left\{X=x_{i}, Y=y_{j}\right\}}{P\left\{X=x_{i}\right\}}=\frac{p_{ij}}{p_{i}}
为在X为xix_{i}条件下, Y的条件分布律(同理定义Y为yiy_{i}条件下, X的条件分布律).

注1:条件分布律也是分布律.因为P{Y=yjX=xi}0P\left\{Y=y_{j} | X=x_{i}\right\} \geq 0jP{Y=yjX=xi}=pijpi=pipi=1\sum_{j} P\left\{Y=y_{j} | X=x_{i}\right\}=\frac{\sum p_{i j}}{p_{i}}=\frac{p_{i}}{p_{i}}=1

例1:昆虫产卵,设某种昆虫产卵数XP(λ)X \sim P(\lambda), 设卵的孵化率为pp,孵化数记为YY,求

a)X,YX,Y的联合分布律;

b)X,YX,Y的边缘分布律;

b)求P{X=iY=j}P\{X=i|Y=j\}.

解:a) 由题意知, 当产卵数x固定时,YB(x,p),Y \sim B(x, p),故由乘法公式:
pij=P{X=i,Y=j}=P{Y=jX=i}P{X=i}=(ij)pj(1p)ijeλλii!,ij,i=0,1,p_{i j}=P\{X=i, Y=j\}=P\{Y=j | X=i\} \cdot P\{X=i\}=\left(\begin{array}{c}i \\ j\end{array}\right) p^{j}(1-p)^{i-j} \cdot e^{-\lambda} \frac{\lambda^{i}}{i !}, \quad i \geq j, \quad i=0,1, \dots

b)pi=eλλii!,i=0,1,pj=eλp(λp)jj!,j=0,1,p_{i \cdot}=e^{-\lambda} \frac{\lambda^{i}}{i !}, \quad i=0,1, \ldots \quad p_{\cdot j}=e^{-\lambda p} \frac{(\lambda p)^{j}}{j !}, \quad j=0,1, \cdots
c)【概率论】条件分布与独立性(上)

这表明在Y=jY=j的条件下,产卵数i与j的差服从P(λ(1p))P(\lambda(1-p))

条件概率密度函数:若(X, Y)为连续型随机变量,由于所有形如P{X=x, Y=y}的式子均为0.为此,将条件Y=y放宽为Y(yε,y+ε],Y\in(y-\varepsilon, y+\varepsilon],即考虑条件概率
P{XxyεYy+ε}=P{Xx,yεYy+ε}P{yεYy+ε}=xyεy+εf(u,v)dudvyεy+εfY(v)dv P\{X\leq x | y-\varepsilon \leq Y \leq y+\varepsilon\} \\ =\frac{P\{X \leq x, y-\varepsilon \leq Y \leq y+\varepsilon\}}{P\{y-\varepsilon \leq Y \leq y+\varepsilon\}} \\ =\frac{\int_{-\infty}^{x} \int_{y-\varepsilon}^{y+\varepsilon} f(u, v) d u d v}{\int_{y-\varepsilon}^{y+\varepsilon} f_{Y}(v) d v}

利用积分中值定理:
=2εxf(u,η)du2εfY(ξ) 上式=\frac{2 \varepsilon \int_{-\infty}^{x} f(u, \eta) d u}{2 \varepsilon f_{Y}(\xi)}
ε0,η,ξy令\varepsilon \rightarrow 0, \eta, \xi \rightarrow y,若f(u,v)f(u, v)是连续的, 则应有上式收敛于xf(u,y)dufY(y)\frac{\int_{-\infty}^{x} f(u, y) d u}{f_{Y}(y)}
,上式称为在Y=y条件下x的分布函数,因此将其对x求导得:
f(xy)={f(x,y)fY(y),fY(y)>00, 其他  f(x | y)=\left\{\begin{array}{ll} \frac{f(x, y)}{f_{Y}(y)}, & f_{Y}(y)>0 \\ 0, & \text { 其他 } \end{array}\right.
称为是在Y=y条件下,x的条件密度,记为 fXY(xy)f_{X | Y}(x | y),同理定义 fYX(yx)f_{Y | X}(y | x)

注1:条件密度函数也是密度函数,容易验证其非负性和正则性

注2:其含义是将二维分布限制在直线Y=yY=y上,X的分布.其分布关系与二维分布一致,但相差一个规范化因子fY(y)f_{Y}(y),因为这块截痕的面积为fY(y)f_{Y}(y),如图:
【概率论】条件分布与独立性(上)