links
BM25 (Best Match 25) operationalises the Probabilistic Relevance Framework (PRF) by modelling document relevance as a probability estimate derived from term frequency, inverse document frequency, and document length normalisation, combining these signals through a tuneable saturating TF component (controlled by parameters k1 and b) to score documents against queries. The critical mechanism is the non-linear TF saturation curve, which prevents high-frequency terms from dominating relevance scores disproportionately, while the b parameter normalises document length against corpus averages, penalising verbose documents that accumulate term counts artificially. BM25 provides a computationally efficient, parameter-interpretable baseline that outperforms raw TF-IDF by handling term redundancy and document length bias - making it the de facto retrieval function for inverted-index architectures where lexical matching must approximate probabilistic relevance without requiring training data or vector embeddings.
Google's Knowledge Graph is a structured entity database that maps real-world objects - people, places, organisations, and concepts - to semantically rich attribute sets and inter-entity relationships, replacing string-matched keyword lookup with disambiguated, meaning-based retrieval. The system resolves lexical ambiguity (e.g., "Taj Mahal" as monument vs. musician vs. restaurant) by anchoring queries to canonical entities with unique identifiers, drawing from synthesised sources including Freebase, Wikipedia, and the CIA World Factbook to populate typed properties and relational edges. This shifts ranking and indexing logic from document-to-keyword co-occurrence toward entity-to-entity graph traversal, enabling query expansion, direct answer surfacing, and contextual result clustering without requiring exact-match signals in crawled content.