prompt caching
Provider-side cache of repeated prompt prefixes — dramatically lowers cost and latency for long, stable system prompts.
Definition
Prompt caching is the LLM provider's optimisation of reusing computation across requests that share a long, stable prefix. Anthropic and OpenAI both offer it. The cache hit returns at a fraction of the input-token price and lower latency. Designing prompts so the cacheable prefix (system prompt, tool defs, RAG docs) comes first is the key lever for cost on production agents.