RAG

Retrieval-Augmented Generation — fetch relevant documents at query time and feed them to the LLM as context.

Definition

Retrieval-Augmented Generation is the pattern of fetching relevant chunks of external knowledge (via embedding search, keyword search, or both) at query time and injecting them into the LLM prompt. RAG keeps answers grounded in current data without retraining the model. Most production knowledge-base assistants, internal chatbots, and citation-aware search products are RAG systems.

When to use

See also