prompt caching

Provider-side cache of repeated prompt prefixes — dramatically lowers cost and latency for long, stable system prompts.

Definition

Prompt caching is the LLM provider's optimisation of reusing computation across requests that share a long, stable prefix. Anthropic and OpenAI both offer it. The cache hit returns at a fraction of the input-token price and lower latency. Designing prompts so the cacheable prefix (system prompt, tool defs, RAG docs) comes first is the key lever for cost on production agents.

When to use

See also