context window

The maximum number of tokens an LLM can consider at once — input prompt plus output combined.

Definition

An LLM's context window is the hard token-count limit on the prompt-plus-completion it can process in a single request. Sonnet 4.5 and similar 2025-era models offer 200K-1M token windows; Gemini stretches to 2M. Bigger windows enable RAG over more docs and longer agent transcripts without compaction, at the cost of higher per-call latency and price.

When to use

See also