links
HITS (Hyperlink-Induced Topic Search) defines a mutually reinforcing, iterative computation over directed hyperlink graphs that separates web pages into two distinct authority roles: hubs (pages linking to many quality resources) and authorities (pages linked to by many quality hubs), solving the problem of identifying high-quality topical resources from link structure alone without relying on content analysis. The core mechanism executes repeated matrix-vector multiplications on the adjacency matrix of a query-specific subgraph (the "base set" expanded via neighborhood sampling), converging via principal eigenvector extraction to produce hub and authority weight scores that amplify pages receiving links from well-connected hub pages. This eigenvector-based, query-dependent link analysis directly informs search ranking by demonstrating that in-link count alone is insufficient - link source quality propagates authority transitively, establishing the theoretical foundation for trust-weighted, graph-theoretic ranking signals that later shaped PageRank's global, query-independent implementation and modern link equity models in crawl prioritisation and index scoring.
Proposes modifications to the HITS algorithm that address link-spam vulnerabilities and topic drift by incorporating content similarity analysis and anchor text weighting into hub-authority score propagation. Experiments demonstrate that filtering semantically irrelevant links before iterative score computation reduces noise amplification, producing authority scores that more accurately reflect genuine topical relevance rather than raw link popularity. These refinements directly impact crawl prioritisation and authority-based ranking systems by making hub-authority scores resistant to manipulated link structures, improving the signal quality of link graph analysis for topical authority determination.