EM算法(2)

1.EM算法
   假定有训练数据集
{x(1),x(2),,x(m)}\left\{x^{(1)}, x^{(2)}, \cdots, x^{(m)}\right\}
   包含m个独立样本,希望从中找出该组数据得模型模型p(x,z)p(x, z)得参数。
   取对数似然函数
l(θ)=i=1mlogp(x;θ)=i=1mlogzp(x,z;θ)\begin{aligned} &l(\theta)=\sum_{i=1}^{m} \log p(x ; \theta)\\ &=\sum_{i=1}^{m} \log \sum_{z} p(x, z ; \theta) \end{aligned}
   z是隐随机变量,不方便直接找到参数估计,使用下面的策略找出:计算1(θ)1(\theta)的下界,求该下界最大值;重复该过程,直到收敛到局部最大值。
EM算法(2)
   令QiQ_i是z的某一个分布,Qi0Q_i \geq 0,有:
l(θ)=i=1mlogzp(x,z;θ)=i=1mlogz(i)p(x(i),z(i);θ)l(\theta)=\sum_{i=1}^{m} \log \sum_{z} p(x, z ; \theta)=\sum_{i=1}^{m} \log \sum_{z^{(i)}} p\left(x^{(i)}, z^{(i)} ; \theta\right) =i=1mlogz(MQi(z(i))p(x(i),z(i);θ)Qi(z(i))i=1mz(i)Qi(z(i))logp(x(i),z(i);θ)Qi(z(i))\begin{aligned} &=\sum_{i=1}^{m} \log \sum_{z^{(M}} Q_{i}\left(z^{(i)}\right) \frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)}\\ &\geq \sum_{i=1}^{m} \sum_{z^{(i)}} Q_{i}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)} \end{aligned}
   寻找尽量紧的下界,可以令:
p(x(i),z(i);θ)Qi(z(i))=c\frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)}=c
   进一步分析:
Qi(z(i))p(x(i),z(i);θ)zQi(z(i))=1Qi(z(i))=p(x(i),z(i);θ)zp(x(i),z(i);θ)=p(x(i),z(i);θ)p(x(i);θ)=p(z(i)x(i);θ)\begin{array}{c} Q_{i}\left(z^{(i)}\right) \propto p\left(x^{(i)}, z^{(i)} ; \theta\right) \quad \sum_{z} Q_{i}\left(z^{(i)}\right)=1 \\ Q_{i}\left(z^{(i)}\right)=\frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{\sum_{z} p\left(x^{(i)}, z^{(i)} ; \theta\right)} \\ =\frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{p\left(x^{(i)} ; \theta\right)} \\ =p\left(z^{(i)} | x^{(i)} ; \theta\right) \end{array}
   EM算法整体框架:
EM算法(2)
2.从理论公式推导GMM
   随机变量X是由K个高斯分布混合而成,取各个高斯分布的概率为φ1φ2φK\varphi_{1} \varphi_{2} \cdots \varphi_{K},第i个高斯分布的均值为μi\mu_i,方差为i\sum_i。若观测到随机变量X的一系列样本x1,x2,,xn\mathrm{x}_{1}, \mathrm{x}_{2}, \ldots, \mathrm{x}_{\mathrm{n}},试估计参数φ,μ,Σ\varphi, \quad \boldsymbol{\mu}, \quad \boldsymbol{\Sigma}
   E-step
wj(i)=Qi(z(i)=j)=P(z(i)=jx(i);ϕ,μ,Σ)w_{j}^{(i)}=Q_{i}\left(z^{(i)}=j\right)=P\left(z^{(i)}=j | x^{(i)} ; \phi, \mu, \Sigma\right)
   M-step
   将多项分布和高斯分布的参数带入:
i=1mz(i)Qi(z(i))logp(x(i),z(i);ϕ,μ,Σ)Qi(z(i))=i=1mj=1kQi(z(i)=j)logp(x(i)z(i)=j;μ,Σ)p(z(i)=j;ϕ)Qi(z(i)=j)=i=1mj=1kwj(i)log1(2π)n/2Σj1/2exp(12(x(i)μj)TΣj1(x(i)μj))ϕjwj(i)\begin{array}{l} \sum_{i=1}^{m} \sum_{z^{(i)}} Q_{i}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \phi, \mu, \Sigma\right)}{Q_{i}\left(z^{(i)}\right)} \\ \quad=\sum_{i=1}^{m} \sum_{j=1}^{k} Q_{i}\left(z^{(i)}=j\right) \log \frac{p\left(x^{(i)} | z^{(i)}=j ; \mu, \Sigma\right) p\left(z^{(i)}=j ; \phi\right)}{Q_{i}\left(z^{(i)}=j\right)} \\ \quad=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{n / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}} \end{array}
   对均值求偏导
μli=1mj=1kwj(i)log1(2π)n/2j1/2exp(12(x(i)μj)TΣj1(x(i)μj))ϕjwj(i)=μli=1mj=1kwj(i)12(x(i)μj)TΣj1(x(i)μj)=12i=1mwl(i)μl2μlTΣl1x(i)μlTΣl1μl=i=1mwl(i)(Σl1x(i)Σl1μl)\begin{array}{l} \nabla_{\mu_{l}} \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{\left.\left.(2 \pi)^{n / 2}\right|\sum_ j\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}} \\ \quad=-\nabla_{\mu_{l}} \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right) \\ \quad=\frac{1}{2} \sum_{i=1}^{m} w_{l}^{(i)} \nabla_{\mu_{l}} 2 \mu_{l}^{T} \Sigma_{l}^{-1} x^{(i)}-\mu_{l}^{T} \Sigma_{l}^{-1} \mu_{l} \\ \quad=\sum_{i=1}^{m} w_{l}^{(i)}\left(\Sigma_{l}^{-1} x^{(i)}-\Sigma_{l}^{-1} \mu_{l}\right) \end{array}
   令上式等于0,解的均值为:
μl:=i=1mwl(i)x(i)i=1mwl(i)\mu_{l}:=\frac{\sum_{i=1}^{m} w_{l}^{(i)} x^{(i)}}{\sum_{i=1}^{m} w_{l}^{(i)}}
   对方差求偏导,等于0
Σj=i=1mwj(i)(x(i)μj)(x(i)μj)Ti=1mwj(i)\Sigma_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}}{\sum_{i=1}^{m} w_{j}^{(i)}}
   多项分布参数,考察M-step的目标函数,对于ϕ\phi,删除常数项
i=1mj=1kwj(i)log1(2π)n/2Σj1/2exp(12(x(i)μj)TΣj1(x(i)μj))ϕjwj(i)\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{n / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}}
   得到
i=1mj=1kwj(i)logϕj\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \phi_{j}
   拉格朗日乘子法
   由于多项分布的概率和为1,建立拉格朗日方程
L(ϕ)=i=1mj=1kwj(i)logϕj+β(j=1kϕj1)\mathcal{L}(\phi)=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \phi_{j}+\beta\left(\sum_{j=1}^{k} \phi_{j}-1\right)
   求偏导,等于0
ϕjL(ϕ)=i=1mwj(i)ϕj+ββ=i=1mj=1kwj(i)=i=1m1=mϕj:=1mi=1mwj(i)\begin{array}{c} \frac{\partial}{\partial \phi_{j}} \mathcal{L}(\phi)=\sum_{i=1}^{m} \frac{w_{j}^{(i)}}{\phi_{j}}+\beta \\ -\beta=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)}=\sum_{i=1}^{m} 1=m \\ \phi_{j}:=\frac{1}{m} \sum_{i=1}^{m} w_{j}^{(i)} \end{array}
   总结,对于所有的数据点,可以看作组份k生成了这些点。组份k是一个标准的高斯分布,利用上面结论:{γ(i,k)xii=1,2,N}\left\{\gamma(i, k) x_{i} | i=1,2, \cdots N\right\}
{μk=1Nki=1Nγ(i,k)xiΣk=1Nki=1Nγ(i,k)(xiμk)(xiμk)Tπk=1Ni=1Nγ(i,k)Nk=Nπk\left\{\begin{array}{l} \mu_{k}=\frac{1}{N_{k}} \sum_{i=1}^{N} \gamma(i, k) x_{i} \\ \Sigma_{k}=\frac{1}{N_{k}} \sum_{i=1}^{N} \gamma(i, k)\left(x_{i}-\mu_{k}\right)\left(x_{i}-\mu_{k}\right)^{T} \\ \pi_{k}=\frac{1}{N} \sum_{i=1}^{N} \gamma(i, k) \\ N_{k}=N \cdot \pi_{k} \end{array}\right.