Skip to main content
Recurrent Attention for Neural Machine Translation

​Upon its emergence, the Transformer Neural Networks [1] dominates the sequence-to-sequence tasks. It even outperforms the Google Neural Machine Translation model in specific tasks. Specifically, the multi-head attention mechanism that depends on element-wise dot-product is deemed as one of the critical building blocks to get things to work. But is it really that important?


Jiachen LiAbout 5 minMTDL4MTTransformerRecurrent Attention