links

Learning Deep Structured Semantic Models for Web Search using Clickthrough Data microsoft.com pdf

DSSM (Deep Structured Semantic Model) employs a deep neural network with a word hashing layer to project queries and documents into a shared low-dimensional semantic space, trained end-to-end on clickthrough data by maximising the posterior probability of clicked documents given queries. The model uses letter-trigram-based word hashing to reduce input dimensionality from 500K+ vocabulary terms to ~30K features, achieving statistically significant NDCG gains (~1-2% absolute) over BM25, LSA, and PLSA baselines in web search ranking tasks. This architecture enables ranking systems to overcome lexical mismatch between queries and documents - surfacing semantically relevant results where no keyword overlap exists - directly impacting relevance scoring layers in learning-to-rank pipelines without requiring manual feature engineering or query expansion modules.

Accurately Interpreting Clickthrough Data as Implicit Feedback cs.cornell.edu pdf

A relative preference model that reframes clickthrough data as comparative judgments between examined results rather than absolute relevance signals, using eye-tracking and controlled experiments to calibrate interpretation of user clicks. Identifies that absolute click rates carry strong presentation bias (position, snippet quality), but relative click patterns - specifically "clicked above non-clicked" pairs - yield reliable relevance signals robust to trust bias and ranking artefacts. Enables search systems to extract high-quality implicit feedback for learning-to-rank algorithms by mining pairwise preference constraints from click logs rather than treating raw click frequency as a direct relevance proxy.