links
A late interaction architecture that independently encodes queries and documents into token-level BERT embeddings at indexing time, then computes relevance via a cheap MaxSim (Maximum Similarity) operator across all query-document token pairs at retrieval time. This decomposition reduces query-time BERT computation by over 170× compared to cross-encoder models while matching or exceeding their ranking quality on MS MARCO and TREC CAR benchmarks, achieving end-to-end re-ranking in under 50ms. This enables pre-indexing of full document corpora into compressed vector stores, decoupling expensive neural encoding from live query latency and making dense contextual ranking feasible at web-scale without sacrificing ranking depth or passage-level precision.
BEIR established a standardised framework of 18 diverse datasets (covering fact-checking, QA, and news) to measure zero-shot generalisation in Information Retrieval (IR). The benchmark's core finding is the "Generalisation Gap" - while dense retrieval models (like DPR) excel in-domain, they frequently underperform BM25 on out-of-domain tasks. This highlights a critical brittleness in neural IR. Explains the continued necessity of lexical matching (keywords) as a robust signal that complements semantic interpretation in diverse or "long-tail" query environments.