Claude Code Context Window Optimization
Guide by @aakashgupta: Context is the scarcest resource in Claude Code. Before the first user message is sent, 10%–16% of the context window is already consumed by system prompts, MCP servers, and custom agents.
Overhead breakdown
| Component | Approximate overhead |
|---|---|
| System prompt | ~2% |
| Each MCP server | ~8%+ |
| Custom agent | ~4% |
| Conversation history | grows continuously |
Cost hierarchy (lowest → highest overhead)
- CLI — zero overhead (e.g., GitHub CLI, Vercel CLI, Firecrawl CLI)
- API — moderate overhead
- MCP — highest overhead
Andrej Karpathy independently confirmed this CLI > API > MCP ordering.
Practical tactics
Monitor usage
- Run
/status lineto see real-time context consumption. - Set color-coded thresholds:
- Green: < 50%
- Orange: 50–80%
- Red: > 80%
Prefer CLI over MCP
- Before adding an MCP server, ask whether a CLI tool can achieve the same result.
- Periodically audit existing MCP integrations for CLI replacements.
Escape double-tap
- If Claude starts drifting off-topic, press Escape twice.
- This rolls back to before the erroneous prompt and physically removes the subsequent content from context.
Pawel Huryn's Four Root Causes (2026-04-26)
Pawel Huryn documented the four root causes of excessive Claude Code token consumption after Anthropic fixed three platform bugs (v2.1.116+):
1. Cache misses
The prompt cache is the single biggest lever. Cache read costs 0.1x input price; cache write costs 1.25x (5-min TTL) or 2x (1-hour TTL). Every hit on a cached prefix resets its TTL at no extra cost.
Key rules:
- Lock tools at session start. Adding or removing a tool mid-session invalidates the cached prefix.
- Lock the model at session start. Switching models mid-session blows the cache.
- ~90% hit rate is healthy on the 5-minute default. On 1-hour TTL (API-only), it climbs to ~97-99%.
2. Context bloat
For Opus 4.7, 1M context is the default but expensive. Long sessions sprawl and auto-compact fires later than optimal.
Recommended settings:
{
"env": {
"CLAUDE_CODE_DISABLE_1M_CONTEXT": "1",
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "80"
}
}
This disables 1M context (fallback to 200K) and pins auto-compact at 80%.
Session moves:
/compactat 50% or after every task. Do not wait for auto./clearbetween unrelated work. New session = fresh prefix./rewindwhen a turn went sideways. Cheaper than re-prompting around bad context.- Subagents for anything that does not need the parent's reasoning.
3. Wrong model or effort
Three separate dials that burn tokens fast if misconfigured:
| Effort level | Use case |
|---|---|
| low | quick fixes, mechanical tasks |
| medium | most prompts (huge savings vs default) |
| high | demanding reasoning |
| xhigh | default for agentic coding (Opus 4.7) |
| max | diminishing returns; rarely worth the ~2x xhigh cost |
Set effort per prompt, not per task or session.
Model routing in CLAUDE.md:
- Sonnet session: cheaper, good when work is in-scope for Sonnet
- Opus session + delegate: pay for Opus on parent (planning, tradeoffs), delegate rest
4. Wrong input format
Some inputs are token-expensive by default:
| Expensive | Cheaper replacement |
|---|---|
| Screenshots and Chrome scraping | agent-browser (accessibility tree, ~82% fewer tokens) |
| PDFs via Read tool | pdftotext (Read loads PDFs as images) |
| Large repo raw file reads | code-review-graph (persistent AST map, 6.8x fewer tokens on reviews) |
Additional tools:
- rtk-ai/rtk: CLI proxy that strips redundant whitespace and compresses tool output (~20-30% token reduction)
- juliusbrussee/caveman: terse-output skill that drops conversational filler without affecting reasoning
Ronin 的五大上下文陷阱与四层模型路由(2026-05-13)
Ronin 将月开销从 4,200 美元降到 312 美元,核心方法是识别并消除无效上下文。
五大陷阱
| 陷阱 | 机制 | 成本影响 |
|---|---|---|
| 每轮重发整个代码库 | Cursor/Claude Code 自动上下文每次带上 30-50 个未变更文件 | 50 文件 ≈ 80K token/轮;每天 50 轮 = 60 美元/天 |
| 工具调用循环 | Agent 每句"我检查一下"都全额重付整个上下文 | 同一 5 万 token 上下文可能付 5 次 |
| 大材小用 | 改错别字、格式化 JSON 用 Opus | 便宜模型完全能胜任 |
| 错误的流式/批量选择 | 流式输出使提示缓存失效;可批量场景用流式 | 双重浪费 |
| 上下文膨胀 | 不确定要不要的文件全带上 | 一句修 bug 飙到 80K token |
四层模型路由体系
| 层级 | 模型 | 任务类型 |
|---|---|---|
| 高级层 | Opus / frontier | 关键架构决策、复杂推理 |
| 标准层 | Sonnet / mid-tier | 日常编码、多数实现任务 |
| 快速层 | Haiku / fast | 格式化、重命名、简单重构 |
| 批处理层 | Batch API | 大量同质任务、后台处理 |
关键原则:不要用一个模型处理所有事。全用 Sonnet 折中方案可能让成本比实际需求高 6 倍。
Counterpoints & Gaps
- The exact percentages (2%, 8%, 4%) are approximate and may vary by model version and prompt length.
- Some workflows genuinely require MCP-level integration; CLI substitution is not always feature-complete.
- There is no documented guarantee that Escape double-tap permanently erases tokens from the billing context.
- Ronin 的 80% 降幅基于特定项目结构,不具有普遍代表性。但五大陷阱的定性判断与 Pawel Huryn 的框架一致。