隐马尔可夫模型:观察序列概率估计

隐马尔可夫模型:观察序列概率估计(前向后向算法)

解码(decoding)问题(估计问题):给定观察序列O=O1O2OTO = O_{1} O_{2} \cdots O_{T}和模型μ=(A,B,π)\mu = (\mathbf{A}, \mathbf{B}, \mathbf{\pi}),计算观察序列OO的概率,即P(Oμ)P(O | \mu)

穷举法:对于任意的状态序列Q=q1q2qTQ = q_{1} q_{2} \cdots q_{T},有

P(OQ;μ)=t=1T1P(Otqt,qt1;μ)=bq1(O1)bq2(O2)bqT(OT)(6-8)\begin{aligned} P(O| Q ; \mu) & = \prod_{t = 1}^{T - 1} P(O_{t} | q_{t}, q_{t - 1}; \mu) \\ & = b_{q_{1}}(O_{1}) b_{q_{2}}(O_{2}) \cdots b_{q_{T}}(O_{T}) \end{aligned} \tag {6-8}

P(Q;μ)=πq1aq1q2aq2q3aqT1qT(6-9)\begin{aligned} P(Q ; \mu) & = \pi_{q_{1}} a_{q_{1} q_{2}} a_{q_{2} q_{3}} \cdots a_{q_{T - 1} q_{T}} \end{aligned} \tag {6-9}

P(O;μ)=QP(O,Q;μ)=QP(OQ;μ)P(Q;μ)=Q(πq1bq1(O1)t=1T1aqtqt+1bqt+1(Ot+1))(6-11)\begin{aligned} P(O ; \mu) & = \sum_{Q} P(O, Q; \mu) \\ & = \sum_{Q} P(O | Q; \mu) P(Q ; \mu) \\ & = \sum_{Q} \left( \pi_{q_{1}} b_{q_{1}}(O_{1}) \prod_{t = 1}^{T - 1} a_{q_{t} q_{t + 1}} b_{q_{t + 1}}(O_{t + 1}) \right) \end{aligned} \tag {6-11}

该方式的问题是计算时间复杂度呈指数增长,假高模型μ=(A,B,π)\mu = (\mathbf{A}, \mathbf{B}, \mathbf{\pi})NN个不同的状态,时间长度为TT,则可能状态序列数量为NTN^{T}。前向算法(前向计算过程,forward procedure)通过动态规划方式使“指数爆炸”问题可以在时间复杂度为O(N2T)\mathcal{O}(N^{2} T)的范围内解决。HMM的动态规划问题–般用格架[Manning、Schitze,1999](trellis,lattice)组织形式描述。对于一个在某一时间结束在一定状态的HMM,每一个格(结点)记录该HMM所有输出符号的概率,较长子路径的概率可以由较短子路径的概率计算出来。

隐马尔可夫模型:观察序列概率估计

定义6-1(前向变量αt(i)\alpha_{t}(i):在tt时刻,HMM输出序列为O=O1O2OtO = O_{1} O_{2} \cdots O_{t},并且状态为sis_{i}的概率:

αt(i)=P(O1O2Ot,qt=st;μ)(6-12)\alpha_{t}(i) = P(O_{1} O_{2} \cdots O_{t}, q_{t} = s_{t}; \mu) \tag {6-12}

由于P(O;μ)P(O ; \mu)是在所有状态qTq_{T}下观察到序列O=O1O2OTO = O_{1} O_{2} \cdots O_{T}的概率:

P(O;μ)=siP(O1O2OT,qt=si;μ)=i=1NαT(i)(6-13)P(O ; \mu) = \sum_{s_{i}} P(O_{1} O_{2} \cdots O_{T}, q_{t} = s_{i}; \mu) = \sum_{i = 1}^{N} \alpha_{T}(i) \tag {6-13}

因此,对P(O;μ)P(O ; \mu)的计算可转变为前向变量αt(i)\alpha_{t}(i)的快速计算。

在格架结构中,αt+1(j)\alpha_{t + 1}(j)存放在结点(sj,t+1)(s_{j}, t + 1)处,表示在已知观察序列O=O1O2Ot+1O = O_{1} O_{2} \cdots O_{t + 1}的情况下,从tt时刻到达t+1t + 1时刻的状态为sjs_{j}的概率。从初始时刻到t+1t + 1时刻,HMM到达状态sjs_{j},并且输出观察序列为O=O1O2Ot+1O = O_{1} O_{2} \cdots O_{t + 1}的过程可以分解为两个步骤:

  1. 从初始时刻到tt时刻,HMM到达状态sis_{i}并输出观察序列O=O1O2OtO = O_{1} O_{2} \cdots O_{t},概率为αt(i)\alpha_{t}(i)

  2. 从状态sis_{i}转移到状态sjs_{j},并在状态sjs_{j}Ot+1O_{t + 1},概率为aijbj(Ot+1)a_{ij} b_{j}(O_{t + 1})

隐马尔可夫模型:观察序列概率估计

由于HMM可以从NN个不同的sis_{i}转移到sjs_{j},因此,t+1t + 1时刻的前向变量可由tt时刻前向变量αt(1),αt(2),,αt(N)\alpha_{t}(1), \alpha_{t}(2), \cdots, \alpha_{t}(N)归纳计算:

αt+1(j)=(i=1Nαt(i)aij)bj(Ot+1)(6-14)\alpha_{t + 1}(j) = \left( \sum_{i = 1}^{N} \alpha_{t}(i) a_{ij} \right) b_{j}(O_{t + 1}) \tag {6-14}

前向变量αt(i)\alpha_{t}(i)可采用动态规划方法计算。

算法6-1(前向算法,forward procedure)

  1. 初始化

α1(i)=πibi(O1),1iN\alpha_{1}(i) = \pi_{i} b_{i}(O_{1}), 1 \leq i \leq N

  1. 归纳计算

αt+1(j)=(i=1Nαt(i)aij)bj(Ot+1), 1tT1\alpha_{t + 1}(j) = \left( \sum_{i = 1}^{N} \alpha_{t}(i) a_{ij} \right) b_{j}(O_{t + 1}), \ 1 \leq t \leq T - 1

  1. 求和:

P(Oμ)=i=1NαT(i)P(O | \mu) = \sum_{i = 1}^{N} \alpha_{T}(i)

前向算法总体时间复杂度为O(N2T)\mathcal{O}(N^{2}T)

P(O;μ)P(O ; \mu)还可以转变为后向变量βt(i)\beta_{t}(i),通过动态规划快速计算。

定义6-2(后向变量βt(i)\beta_{t}(i):给定模型μ=(A,B,π)\mu = (\mathbf{A}, \mathbf{B}, \mathbf{\pi}),并且在tt时刻状态为sis_{i}的条件下,HMM输出观察序列O=Ot+1Ot+2OTO = O_{t + 1} O_{t + 2} \cdots O_{T}的概率:

βt(i)=P(Ot+1Ot+2OTqt=st;μ)(6-15)\beta_{t}(i) = P(O_{t + 1} O_{t + 2} \cdots O_{T} | q_{t} = s_{t}; \mu) \tag {6-15}

tt时刻状态为sis_{i}的条件下,HMM输出观察序列为O=Ot+1Ot+2OTO = O_{t + 1} O_{t + 2} \cdots O_{T}的过程可以分解为两个步骤:

  1. tt时刻到t+1t + 1时刻,HMM由状态sis_{i}转移至状态sjs_{j},并从状态sjs_{j}输出Ot+1O_{t + 1},概率为aijbj(Ot+1)a_{ij} b_{j}(O_{t + 1})

  2. t+1t + 1时刻、状态sjs_{j}的条件下,HMM输出观察序列O=Ot+2OTO = O_{t + 2} \cdots O_{T},概率为βt+1(j)\beta_{t + 1}(j)

隐马尔可夫模型:观察序列概率估计

因此:

βt(i)=j=1Naijbj(Ot+1)βt+1(j)(6-16)\beta_{t}(i) = \sum_{j = 1}^{N} a_{ij} b_{j}(O_{t + 1}) \beta_{t + 1}(j) \tag {6-16}

算法6-2(后向算法,backward procedure)

  1. 初始化

βT(i)=1,1iN\beta_{T}(i) = 1, 1 \leq i \leq N

  1. 归纳计算

βt(i)=j=1Naijbj(Ot+1)βt+1(j), T1t1\beta_{t}(i) = \sum_{j = 1}^{N} a_{ij} b_{j}(O_{t + 1}) \beta_{t + 1}(j), \ T - 1 \geq t \geq 1

  1. 求和:

P(Oμ)=i=1Nπibi(O1)β1(i)P(O | \mu) = \sum_{i = 1}^{N} \pi_{i} b_{i}(O_{1}) \beta_{1}(i)

后向算法的时间复杂度为O(N2T)\mathcal{O}(N^{2}T)

一般采用前向、后向算法结合的方法计算观察序列的概率:

P(O,qt=si;μ)=P(O1O2OT,qt=si;μ)=P(O1O2Ot,qt=si,Ot+1Ot+2OT;μ)=P(O1O2Ot,qt=si;μ)P(Ot+1Ot+2OTO1O2Ot,qt=si;μ)=P(O1O2Ot,qt=si;μ)P(Ot+1Ot+2OTqt=si;μ)=αt(i)βt(i)(6-17)\begin{aligned} P(O, q_{t} = s_{i}; \mu) & = P(O_{1} O_{2} \cdots O_{T}, q_{t} = s{i}; \mu) \\ & = P(O_{1} O_{2} \cdots O_{t}, q_{t} = s{i}, O_{t + 1} O_{t + 2} \cdots O_{T}; \mu) \\ & = P(O_{1} O_{2} \cdots O_{t}, q_{t} = s{i}; \mu) P(O_{t + 1} O_{t + 2} \cdots O_{T} | O_{1} O_{2} \cdots O_{t}, q_{t} = s{i}; \mu) \\ & = P(O_{1} O_{2} \cdots O_{t}, q_{t} = s{i}; \mu) P(O_{t + 1} O_{t + 2} \cdots O_{T} | q_{t} = s{i}; \mu) \\ & = \alpha_{t}(i) \beta_{t}(i) \end{aligned} \tag {6-17}

因此

P(O;μ)=i=1Nαt(i)βt(i), 1tT(6-18)\begin{aligned} P(O; \mu) = \sum_{i = 1}^{N} \alpha_{t}(i) \beta_{t}(i), \ 1 \leq t \leq T \end{aligned} \tag {6-18}