links
PageRank by Larry Page and Sergey Brin, is the foundational algorithm Google was built on - it ranks web pages by treating hyperlinks as votes, where a link from a high-authority page passes more "link juice" than one from a low-authority page. The model calculates a probability-based score reflecting how often a random web surfer would land on any given page by following links. Explains why backlink quality and site authority matter in SEO, and why links from authoritative sources carry disproportionate ranking value.
Bigtable implements a distributed storage system organizing data as a sparse, persistent, sorted multi-dimensional map indexed by row key, column key, and timestamp, enabling flexible schema evolution across petabyte-scale datasets on commodity hardware. The system achieves high performance through tablet-based range partitioning, a log-structured merge-tree write path via GFS-backed SSTables and a shared commit log, and a three-tier location hierarchy that resolves tablet addresses in ≤3 network hops while supporting thousands of concurrent clients across 500+ commodity servers. Bigtable directly underpins web crawl storage, index serving, and per-URL metadata management—enabling Google to version crawled documents by timestamp, perform selective column reads during indexing pipelines, and scale ranking feature stores horizontally without schema-level migrations or relational join overhead.
TrustRank is a semi-automatic spam-fighting framework that propagates trust scores from a small, manually curated seed set of high-quality pages through the hyperlink graph to assign legitimacy scores to all crawled documents. The system exploits the observation that good pages rarely link to spam, enabling trust to decay with link distance from seeds while isolating link-spam clusters that accumulate inbound links without receiving trust propagation. Search engines applying TrustRank can suppress or demote low-trust pages during ranking, reduce crawler resources wasted on spam-dense host neighbourhoods, and prioritise indexing of nodes with non-trivial trust scores - effectively making large-scale link manipulation economically unviable without proximity to authoritative seed pages.