links

NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE arxiv.org

Attention mechanism for encoder-decoder neural machine translation that dynamically computes soft alignments over all source tokens when generating each target token, replacing the fixed-length context vector bottleneck of prior RNN-based architectures. The model's alignment scores - derived from a learned compatibility function between decoder hidden states and encoder annotations - enable variable-length source representations, yielding state-of-the-art BLEU scores on English-French translation and demonstrating that performance no longer degrades on long sentences as sequence length increases. This attention paradigm directly underpins transformer-based language models (BERT, T5) used in semantic indexing and query-document relevance ranking, as the learned token-to-token alignment weights provide interpretable, context-sensitive representations that capture long-range lexical dependencies critical for passage retrieval and cross-lingual search quality.