Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
Sequence to Sequence Learning with Neural Networks
區別在于其source編碼后的向量C直接作為Decoder階段RNN的初始化state,而不是在每次decode時都作為RNN cell的輸入。此外,decode時RNN的輸入是目標值,而不是前一時刻的輸出
Neural machine translation by jointly learning to align and translate
提出加性attention(score的計算方式,點乘,加法),encoder用雙向
一作Dzmitry Bahdanau,在tensorflow中集成了,接口是的tf.contrib.seq2seq.BahdanauAttention
On using very large target vocabulary for neural machine translation
引入乘性attention(score的計算方式,單層神經網絡,乘法),在tensorflow中也集成,接口是tf.contrib.seq2seq.LuongAttention
還是加權求context vector,區別在于score的計算,即a的計算,用一個單隱藏層的前饋網絡實現。
Effective Approaches to Attention-based Neural Machine Translation
global attention和local attention
attention又分為soft attention和hard attention
soft attention分配的概率是個概率分布,而相對應的hard attention則是非0即1的對齊概率。而local attention則是soft 和 hard 的attention的一個混合方法。一般的操作是先預估一個對齊位置,再在該位置左右各為D的窗口范圍內取類似soft attention的概率分布。