Improving language models by retrieving from trillions of tokens
by Community
Publications — Google DeepMind
OSS
Improving language models by retrieving from trillions of tokens
Added 1 June 2026
Overview
A framework that augments language model predictions by retrieving relevant tokens from a massive corpus (trillions of tokens). It works by integrating a retrieval mechanism into the model's forward pass, allowing it to dynamically access stored knowledge during generation.
Best for
Best for
Researchers and developers building retrieval-augmented language models that demand very large external knowledge stores.
Use cases
- Improving factual accuracy in open-domain question answering
- Enhancing long-form text generation with up-to-date information
- Reducing hallucination in knowledge-intensive NLU tasks
Notes
A framework that augments language model predictions by retrieving relevant tokens from a massive corpus (trillions of tokens). It works by integrating a retrieval mechanism into the model’s forward pass, allowing it to dynamically access stored knowledge during generation.
Use cases
- Improving factual accuracy in open-domain question answering
- Enhancing long-form text generation with up-to-date information
- Reducing hallucination in knowledge-intensive NLU tasks
Pros
- Grants access to substantially more external knowledge than parametric memory alone
- Can reduce model size while maintaining strong performance on knowledge tasks
- Leverages large-scale precomputed indices for fast retrieval
Cons
- Adds retrieval latency and computational overhead during inference
- Requires careful index management and periodic corpus updates
- Retrieval quality depends heavily on corpus coverage and embedding quality
Indexed from awesome-llm and enriched against its public facts.
Pros
- Grants access to substantially more external knowledge than parametric memory alone
- Can reduce model size while maintaining strong performance on knowledge tasks
- Leverages large-scale precomputed indices for fast retrieval
Cons
- Adds retrieval latency and computational overhead during inference
- Requires careful index management and periodic corpus updates
- Retrieval quality depends heavily on corpus coverage and embedding quality
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
Milvus
Community
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Chroma
Community
Search infrastructure for AI
Qdrant
Community
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
LangChain
Community
The agent engineering platform.
Embedchain
Community
Universal memory layer for AI Agents
vLLM
Community
A high-throughput and memory-efficient inference and serving engine for LLMs