Context Window
Definition
The maximum amount of text an AI model can process at once, measured in tokens. GPT-4o has 128K tokens.
Why It Matters
Whatever you put in the prompt, instructions, document, chat history, retrieved snippets, has to fit in the context window. Overflow it and you either truncate (losing information) or chunk + retrieve (adds complexity). Context window is the single number that bounds how much background a model can hold in mind at once.
Key Points
- 1 token ≈ 4 characters of English, ~1.5 tokens per character in code, ~2–3 tokens per character in Asian scripts.
- KV cache for the context window grows linearly with length, a 200K-token context on a 70B model can require 40–80 GB of VRAM for the cache alone.
- Most models show degraded recall for information placed in the middle of very long contexts, the 'lost in the middle' phenomenon documented in 2023.
- GPT-4o: 128K tokens. Claude 3.5 Sonnet: 200K. Qwen 2.5 72B: 128K. Gemini 1.5 Pro: 1M (experimental).
- Prompt-caching APIs (Anthropic, Google) charge 90 %+ less for repeated prefix tokens, cost-critical for long system prompts sent on every call.
Example
GPT-4o has 128K tokens of context (~96K words / ~300 pages). Claude 3.5 Sonnet has 200K. Qwen 2.5 7B has 32K. Reading a novel one chapter at a time fits comfortably in 32K; analysing the whole novel in one shot needs 100K+.
Common Misconception
Context window size is not the same as reliable usable context. Most models exhibit substantially degraded fact-retrieval accuracy for content placed in the middle 60–80 % of a very long context. If precise recall of a specific passage matters, use RAG rather than relying on long-context stuffing.
Related Terms
- TokenThe basic unit of text processing in AI models. Roughly 1 token = 4 characters of English text. Used for billing and context limits.
- LLM (Large Language Model)A neural network trained on massive text datasets that can generate, understand and manipulate human language. Examples: GPT-4, Qwen, Claude.
- RAG (Retrieval-Augmented Generation)A technique where AI retrieves relevant documents before generating a response, improving accuracy.
Context Window on Rewind.ai
Every model on Rewind.ai shows its context window in the picker. The chat UI auto-trims older turns when you approach the limit; RAG is the alternative when "just paste it" isn't viable.
Explore the ToolsQuick Facts
| Term | Context Window |
| Related | Token, LLM (Large Language Model), RAG (Retrieval-Augmented Generation) |