QANet学习

符号说明:Question QANet学习,Context QANet学习, answer span QANet学习
QANet学习: represent both original word and its embedding

 和其它大部分Reading Comprehension模型一样,也包括Embedding layer, Embedding encoder layer, Contex-query attention layer, Model encoder layer, Output layer 五个模块。

QANet学习

1. Embedding Layer

Word:

  • 300-dim GloVe pre-trained word vectors
  • fixed during training
  • OOV words mapped to <UNK>, <UNK>的vector is randomly initialized, trained

Char:

  • 200-dim, max word length is 16
  • concatenate all char vectors of a word to form a matrix, use maximum value of each row to obtain a final vector
  • trained

Final vector of a word is QANet学习, and put it through two layers of high way network.

2. Embedding Encoder Layer

A stack of building blocks: [conv-layer x # + self-attention-layer + feed-forward-layer]

  • depthwise separable convolutions, memory efficient and has better generalization
  • kernel size is 7, number of filters is d = 128, number of conv layers within a block is 4
  • self-attention use mutli-head attention, head number is 8
  • Each of these basic operations (conv/self-attention/ffn) is placed inside a residual block
  • input QANet学习and a given operation QANet学习, the output is QANet学习
  • total number of encoder blocks is 1
  • input dim is QANet学习, output dim is QANet学习

3. Context-Query Attention Layer

  • similarity matrix QANet学习QANet学习, the trilinear similarity function
  • row softmax to get QANet学习, and Context-to-query attention is QANet学习
  • context-to-query attention benefits a bit, and first column softmax to get QANet学习, and then get the query to context attention  QANet学习

4. Model Encoder Layer

  • Input to this layer is QANet学习, where QANet学习 and QANet学习 are row of QANet学习 and QANet学习 
  • parameters are the same as embedding encoder layer
  • number of blocks is 7
  • number of conv layers within a block is 2
  • share weights between the model encoders

5. Output Layer

predict start and end