links
Meilisearch’s breakdown of LSI positions it as a foundational retrieval method that utilises Singular Value Decomposition (SVD) to reduce high-dimensional term-document matrices into a lower-dimensional "latent space." By decomposing the original matrix into three constituent matrices (U, Σ, and Vᵀ), LSI captures hidden conceptual relationships (e.g., grouping "physician" and "doctor"), thereby addressing the retrieval failures of exact-match keyword systems. While computationally efficient for small, static datasets, they highlights that LSI's linear algebraic approach is increasingly superseded by Transformer-based embeddings and Vector Search, which offer superior scalability and deeper contextual understanding of polysemy and linguistic nuance in dynamic web environments.
Google's original 1998 paper introduces a large-scale hypertext web search engine architecture built around two core innovations: a distributed crawling system and a link-based ranking algorithm called PageRank. PageRank computes a page's importance by recursively weighting inbound hyperlinks from high-authority sources, operationalising the citation-graph model of academic literature into a quantifiable 0–10 relevance score calculated across the entire crawled web graph. This anchor-text-plus-PageRank coupling directly challenges pure TF-IDF retrieval models by injecting external link topology into ranking decisions, meaning search systems that index content in isolation without modelling inter-document authority signals will systematically mis-rank high-quality pages against keyword-stuffed low-quality ones.