links

A statistical interpretation of term specificity and its application in retrieval staff.city.ac.uk pdf

Establishes a probabilistic weighting framework that quantifies term specificity in document retrieval by formalising the inverse relationship between collection frequency and retrieval value. The paper derives Inverse Document Frequency (IDF) - calculated as the log of total documents divided by documents containing a term - demonstrating that rare terms carry disproportionately higher discriminatory power for isolating relevant documents from noise. Search ranking systems applying IDF-weighted term scoring achieve measurably superior precision over raw term-frequency matching, forming the mathematical foundation for TF-IDF signals that are useful for content relevance scoring, anchor text evaluation, and keyword targeting models.