links
Attention mechanism for encoder-decoder neural machine translation that dynamically computes soft alignments over all source tokens when generating each target token, replacing the fixed-length context vector bottleneck of prior RNN-based architectures. The model's alignment scores - derived from a learned compatibility function between decoder hidden states and encoder annotations - enable variable-length source representations, yielding state-of-the-art BLEU scores on English-French translation and demonstrating that performance no longer degrades on long sentences as sequence length increases. This attention paradigm directly underpins transformer-based language models (BERT, T5) used in semantic indexing and query-document relevance ranking, as the learned token-to-token alignment weights provide interpretable, context-sensitive representations that capture long-range lexical dependencies critical for passage retrieval and cross-lingual search quality.
Google research paper introducing Percolator, a system built on Bigtable that enables incremental processing of large datasets through distributed transactions and a notification-driven computation model. It replaced the traditional MapReduce batch-processing model, allowing Google to update its search index continuously as individual pages are crawled rather than waiting for a full global rebuild. The system uses a "snapshot isolation" technique to ensure data consistency across distributed tables, where "observers" (code snippets) are triggered by specific data changes to propagate updates through the indexing pipeline. This architecture underpins the shift from the "Google Dance" (monthly index refreshes) to the Caffeine update, providing the infrastructure for near-real-time discovery of content and backlinks, though the ultimate "propagation wave" through various ranking layers still prevents instantaneous global ranking changes.
DSSM (Deep Structured Semantic Model) employs a deep neural network with a word hashing layer to project queries and documents into a shared low-dimensional semantic space, trained end-to-end on clickthrough data by maximising the posterior probability of clicked documents given queries. The model uses letter-trigram-based word hashing to reduce input dimensionality from 500K+ vocabulary terms to ~30K features, achieving statistically significant NDCG gains (~1-2% absolute) over BM25, LSA, and PLSA baselines in web search ranking tasks. This architecture enables ranking systems to overcome lexical mismatch between queries and documents - surfacing semantically relevant results where no keyword overlap exists - directly impacting relevance scoring layers in learning-to-rank pipelines without requiring manual feature engineering or query expansion modules.
Web-scale probabilistic knowledge base that automatically fuses extracted facts from Web content with prior knowledge from existing knowledge bases (Freebase, OpenCyc, Wikidata) using a supervised machine learning pipeline combining extractions, graph-based inference, and calibrated confidence scoring. The system ingests 1.6 billion candidate facts, assigns calibrated probabilities via classifier ensembles and embedding-based propagation, and achieves a corpus of 271 million facts with ≥0.7 confidence—surpassing Freebase's human-curated 350 million facts in breadth while maintaining measurable precision. This architecture enables automated, continuously updated entity-attribute resolution at crawl scale, directly powering entity disambiguation, Knowledge Graph population, and confidence-weighted fact retrieval without reliance on manual curation bottlenecks.