The maximum amount of text an LLM can process in a single request — including the conversation history, system prompt, and tool results.
Every LLM has a context window limit measured in tokens (roughly ¾ of a word). Claude Sonnet has a 200K token window. GPT-4 has 128K. This window holds everything the model sees: your system prompt, conversation history, tool results, and the current question.
When the context window fills up, older messages get dropped. This is why long conversations can lose context — the agent literally cannot see earlier messages anymore.
Managing context window usage is critical for production agents. Wasteful prompts eat into the space available for useful content.
Context window size determines how much information your agent can reason about at once. Too small and the agent forgets important context. Too large and costs increase (you pay per token). Efficient context management is key to production AI agents.
Clawctl manages agent context efficiently. System prompts are optimized. Tool results are summarized when appropriate. You can monitor context usage through the dashboard.
Try Clawctl — 60 Second DeployOlder messages are dropped from context. The agent continues with the most recent information.
As of 2026, Claude models offer up to 200K tokens. Gemini offers up to 1M. Check current provider specs.
Not always. Larger windows cost more per request. Many tasks work perfectly with smaller windows. Match window size to your use case.
Agent Memory
The ability of an AI agent to remember information across conversations and sessions, building knowledge over time.
Model Routing
Directing different agent tasks to different LLM models based on complexity, cost, or speed requirements.
Cost Optimization
Strategies for reducing LLM and infrastructure costs when running AI agents without sacrificing quality or reliability.
Human-in-the-Loop
A design pattern where an AI agent pauses before taking risky actions and waits for a human to approve or reject the action.