links
T5 unifies all NLP tasks - classification, summarisation, and QA - into a text-to-text format, allowing a single transformer architecture to generalise across diverse content types. By introducing the C4 (Colossal Clean Crawled Corpus), the authors established a gold standard for web-scale data cleaning (deduplication and quality heuristics). Most significantly, the paper provides a systematic benchmark of pre-training objectives and scaling laws, proving that diverse language tasks can be mastered through unified transfer learning rather than task-specific engineering.
BEIR established a standardised framework of 18 diverse datasets (covering fact-checking, QA, and news) to measure zero-shot generalisation in Information Retrieval (IR). The benchmark's core finding is the "Generalisation Gap" - while dense retrieval models (like DPR) excel in-domain, they frequently underperform BM25 on out-of-domain tasks. This highlights a critical brittleness in neural IR. Explains the continued necessity of lexical matching (keywords) as a robust signal that complements semantic interpretation in diverse or "long-tail" query environments.