RAG
Retrieval-Augmented Generation — fetch relevant documents at query time and feed them to the LLM as context.
Definition
Retrieval-Augmented Generation is the pattern of fetching relevant chunks of external knowledge (via embedding search, keyword search, or both) at query time and injecting them into the LLM prompt. RAG keeps answers grounded in current data without retraining the model. Most production knowledge-base assistants, internal chatbots, and citation-aware search products are RAG systems.
When to use
See also
- embeddings — Dense vector representations of text — the numeric substrate for semantic search and retrieval-augmented generation.
- LLM — Large Language Model — a transformer-based model trained on internet-scale text to generate and reason.