links
PageRank by Larry Page and Sergey Brin, is the foundational algorithm Google was built on - it ranks web pages by treating hyperlinks as votes, where a link from a high-authority page passes more "link juice" than one from a low-authority page. The model calculates a probability-based score reflecting how often a random web surfer would land on any given page by following links. Explains why backlink quality and site authority matter in SEO, and why links from authoritative sources carry disproportionate ranking value.
HITS (Hyperlink-Induced Topic Search) defines a mutually reinforcing, iterative computation over directed hyperlink graphs that separates web pages into two distinct authority roles: hubs (pages linking to many quality resources) and authorities (pages linked to by many quality hubs), solving the problem of identifying high-quality topical resources from link structure alone without relying on content analysis. The core mechanism executes repeated matrix-vector multiplications on the adjacency matrix of a query-specific subgraph (the "base set" expanded via neighborhood sampling), converging via principal eigenvector extraction to produce hub and authority weight scores that amplify pages receiving links from well-connected hub pages. This eigenvector-based, query-dependent link analysis directly informs search ranking by demonstrating that in-link count alone is insufficient - link source quality propagates authority transitively, establishing the theoretical foundation for trust-weighted, graph-theoretic ranking signals that later shaped PageRank's global, query-independent implementation and modern link equity models in crawl prioritisation and index scoring.
Google’s Reasonable Surfer model represents a shift from a purely topological "Random Surfer" PageRank to a behaviourally-informed weighting system that assigns non-uniform probability to links based on their visual and structural attributes. By analysing feature data - including link position (main content vs. footer), font size, colour contrast, and anchor text length - the algorithm determines a "click-weighted" influence for each citation, ensuring that prominent, contextually relevant links pass more equity than obscured or boilerplate elements. Implies that durability is no longer a simple function of link quantity, but rather a result of probabilistic engagement signals where the value of a backlink is directly tied to its navigational salience and the likelihood of it being selected by a "reasonable" human user.
Google's original 1998 paper introduces a large-scale hypertext web search engine architecture built around two core innovations: a distributed crawling system and a link-based ranking algorithm called PageRank. PageRank computes a page's importance by recursively weighting inbound hyperlinks from high-authority sources, operationalising the citation-graph model of academic literature into a quantifiable 0–10 relevance score calculated across the entire crawled web graph. This anchor-text-plus-PageRank coupling directly challenges pure TF-IDF retrieval models by injecting external link topology into ranking decisions, meaning search systems that index content in isolation without modelling inter-document authority signals will systematically mis-rank high-quality pages against keyword-stuffed low-quality ones.
TrustRank is a semi-automatic spam-fighting framework that propagates trust scores from a small, manually curated seed set of high-quality pages through the hyperlink graph to assign legitimacy scores to all crawled documents. The system exploits the observation that good pages rarely link to spam, enabling trust to decay with link distance from seeds while isolating link-spam clusters that accumulate inbound links without receiving trust propagation. Search engines applying TrustRank can suppress or demote low-trust pages during ranking, reduce crawler resources wasted on spam-dense host neighbourhoods, and prioritise indexing of nodes with non-trivial trust scores - effectively making large-scale link manipulation economically unviable without proximity to authoritative seed pages.