jeff-dean + sanjay-ghemawat · static.googleusercontent.com

Spanner: Google’s Globally-Distributed Database static.googleusercontent.com pdf

Bigtable: A Distributed Storage System for Structured Data static.googleusercontent.com pdf

Bigtable implements a distributed storage system organizing data as a sparse, persistent, sorted multi-dimensional map indexed by row key, column key, and timestamp, enabling flexible schema evolution across petabyte-scale datasets on commodity hardware. The system achieves high performance through tablet-based range partitioning, a log-structured merge-tree write path via GFS-backed SSTables and a shared commit log, and a three-tier location hierarchy that resolves tablet addresses in ≤3 network hops while supporting thousands of concurrent clients across 500+ commodity servers. Bigtable directly underpins web crawl storage, index serving, and per-URL metadata management—enabling Google to version crawled documents by timestamp, perform selective column reads during indexing pipelines, and scale ranking feature stores horizontally without schema-level migrations or relational join overhead.

MapReduce: Simplified Data Processing on Large Clusters static.googleusercontent.com pdf

links