links

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arxiv.org

BERT (Bidirectional Encoder Representations from Transformers) pre-trains a deep transformer architecture using masked language modeling (MLM) and next sentence prediction (NSP) on unlabeled text, enabling simultaneous left-and-right context conditioning across all layers rather than the unidirectional or shallow-bidirectional approaches of predecessor models. Fine-tuned BERT established state-of-the-art performance on 11 NLP benchmarks - including a 7.7% absolute improvement on the GLUE score and 1.5 F1-point gain on SQuAD v1.1 - by learning rich, context-dependent token representations transferable to downstream tasks with minimal task-specific architecture modification. BERT's deep bi-directionality enables query-document semantic matching that captures polysemous terms, long-range syntactic dependencies, and implicit query intent, directly improving relevance ranking signals beyond keyword co-occurrence and making it deployable as a reranker layer over candidate retrieval sets.