links
PageRank by Larry Page and Sergey Brin, is the foundational algorithm Google was built on - it ranks web pages by treating hyperlinks as votes, where a link from a high-authority page passes more "link juice" than one from a low-authority page. The model calculates a probability-based score reflecting how often a random web surfer would land on any given page by following links. Explains why backlink quality and site authority matter in SEO, and why links from authoritative sources carry disproportionate ranking value.
HITS (Hyperlink-Induced Topic Search) defines a mutually reinforcing, iterative computation over directed hyperlink graphs that separates web pages into two distinct authority roles: hubs (pages linking to many quality resources) and authorities (pages linked to by many quality hubs), solving the problem of identifying high-quality topical resources from link structure alone without relying on content analysis. The core mechanism executes repeated matrix-vector multiplications on the adjacency matrix of a query-specific subgraph (the "base set" expanded via neighborhood sampling), converging via principal eigenvector extraction to produce hub and authority weight scores that amplify pages receiving links from well-connected hub pages. This eigenvector-based, query-dependent link analysis directly informs search ranking by demonstrating that in-link count alone is insufficient - link source quality propagates authority transitively, establishing the theoretical foundation for trust-weighted, graph-theoretic ranking signals that later shaped PageRank's global, query-independent implementation and modern link equity models in crawl prioritisation and index scoring.
Hilltop constructs a query-specific authority graph by restricting link-based scoring to "expert documents" - non-affiliated pages containing topically relevant outbound links - thereby isolating genuine editorial endorsement from self-serving or incidental citation networks. Standard PageRank-style algorithms fail to distinguish between links reflecting deliberate expert judgment and links reflecting co-location, reciprocity, or structural spam, producing authority scores that reward link acquisition rather than topical relevance. This implies that ranking durability depends on source qualification upstream of link weighting: a page's authority signal degrades predictably when the underlying linker set lacks demonstrable topical expertise, making expert-filtered link graphs structurally resistant to manipulation at scale.
Proposes modifications to the HITS algorithm that address link-spam vulnerabilities and topic drift by incorporating content similarity analysis and anchor text weighting into hub-authority score propagation. Experiments demonstrate that filtering semantically irrelevant links before iterative score computation reduces noise amplification, producing authority scores that more accurately reflect genuine topical relevance rather than raw link popularity. These refinements directly impact crawl prioritisation and authority-based ranking systems by making hub-authority scores resistant to manipulated link structures, improving the signal quality of link graph analysis for topical authority determination.