QANet学习

符号说明：Question QANet学习，Context , answer span ,
: represent both original word and its embedding

和其它大部分Reading Comprehension模型一样，也包括Embedding layer, Embedding encoder layer, Contex-query attention layer, Model encoder layer, Output layer 五个模块。

QANet学习

1. Embedding Layer

Word:

Char:

200-dim, max word length is 16
concatenate all char vectors of a word to form a matrix, use maximum value of each row to obtain a final vector
trained

Final vector of a word is QANet学习 , and put it through two layers of high way network.

2. Embedding Encoder Layer

A stack of building blocks: [conv-layer x # + self-attention-layer + feed-forward-layer]

depthwise separable convolutions, memory efficient and has better generalization
kernel size is 7, number of filters is d = 128, number of conv layers within a block is 4
self-attention use mutli-head attention, head number is 8
Each of these basic operations (conv/self-attention/ffn) is placed inside a residual block
input and a given operation , the output is
total number of encoder blocks is 1
input dim is , output dim is

3. Context-Query Attention Layer

similarity matrix , , the trilinear similarity function
row softmax to get , and Context-to-query attention is
context-to-query attention benefits a bit, and first column softmax to get , and then get the query to context attention

4. Model Encoder Layer

5. Output Layer

predict start and end