links
A late interaction architecture that independently encodes queries and documents into token-level BERT embeddings at indexing time, then computes relevance via a cheap MaxSim (Maximum Similarity) operator across all query-document token pairs at retrieval time. This decomposition reduces query-time BERT computation by over 170× compared to cross-encoder models while matching or exceeding their ranking quality on MS MARCO and TREC CAR benchmarks, achieving end-to-end re-ranking in under 50ms. This enables pre-indexing of full document corpora into compressed vector stores, decoupling expensive neural encoding from live query latency and making dense contextual ranking feasible at web-scale without sacrificing ranking depth or passage-level precision.
T5 unifies all NLP tasks - classification, summarisation, and QA - into a text-to-text format, allowing a single transformer architecture to generalise across diverse content types. By introducing the C4 (Colossal Clean Crawled Corpus), the authors established a gold standard for web-scale data cleaning (deduplication and quality heuristics). Most significantly, the paper provides a systematic benchmark of pre-training objectives and scaling laws, proving that diverse language tasks can be mastered through unified transfer learning rather than task-specific engineering.
DSSM (Deep Structured Semantic Model) employs a deep neural network with a word hashing layer to project queries and documents into a shared low-dimensional semantic space, trained end-to-end on clickthrough data by maximising the posterior probability of clicked documents given queries. The model uses letter-trigram-based word hashing to reduce input dimensionality from 500K+ vocabulary terms to ~30K features, achieving statistically significant NDCG gains (~1-2% absolute) over BM25, LSA, and PLSA baselines in web search ranking tasks. This architecture enables ranking systems to overcome lexical mismatch between queries and documents - surfacing semantically relevant results where no keyword overlap exists - directly impacting relevance scoring layers in learning-to-rank pipelines without requiring manual feature engineering or query expansion modules.