Claude Code Context Window Optimization

Guide by @aakashgupta: Context is the scarcest resource in Claude Code. Before the first user message is sent, 10%–16% of the context window is already consumed by system prompts, MCP servers, and custom agents.

Overhead breakdown

Component	Approximate overhead
System prompt	~2%
Each MCP server	~8%+
Custom agent	~4%
Conversation history	grows continuously

Cost hierarchy (lowest → highest overhead)

CLI — zero overhead (e.g., GitHub CLI, Vercel CLI, Firecrawl CLI)
API — moderate overhead
MCP — highest overhead

Andrej Karpathy independently confirmed this CLI > API > MCP ordering.

Practical tactics

Monitor usage

Run /status line to see real-time context consumption.
Set color-coded thresholds:
- Green: < 50%
- Orange: 50–80%
- Red: > 80%

Prefer CLI over MCP

Before adding an MCP server, ask whether a CLI tool can achieve the same result.
Periodically audit existing MCP integrations for CLI replacements.

Escape double-tap

If Claude starts drifting off-topic, press Escape twice.
This rolls back to before the erroneous prompt and physically removes the subsequent content from context.

Pawel Huryn's Four Root Causes (2026-04-26)

Pawel Huryn documented the four root causes of excessive Claude Code token consumption after Anthropic fixed three platform bugs (v2.1.116+):

1. Cache misses

The prompt cache is the single biggest lever. Cache read costs 0.1x input price; cache write costs 1.25x (5-min TTL) or 2x (1-hour TTL). Every hit on a cached prefix resets its TTL at no extra cost.

Key rules:

Lock tools at session start. Adding or removing a tool mid-session invalidates the cached prefix.
Lock the model at session start. Switching models mid-session blows the cache.
~90% hit rate is healthy on the 5-minute default. On 1-hour TTL (API-only), it climbs to ~97-99%.

2. Context bloat

For Opus 4.7, 1M context is the default but expensive. Long sessions sprawl and auto-compact fires later than optimal.

Recommended settings:

{
  "env": {
    "CLAUDE_CODE_DISABLE_1M_CONTEXT": "1",
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "80"
  }
}

This disables 1M context (fallback to 200K) and pins auto-compact at 80%.

Session moves:

/compact at 50% or after every task. Do not wait for auto.
/clear between unrelated work. New session = fresh prefix.
/rewind when a turn went sideways. Cheaper than re-prompting around bad context.
Subagents for anything that does not need the parent's reasoning.

3. Wrong model or effort

Three separate dials that burn tokens fast if misconfigured:

Effort level	Use case
low	quick fixes, mechanical tasks
medium	most prompts (huge savings vs default)
high	demanding reasoning
xhigh	default for agentic coding (Opus 4.7)
max	diminishing returns; rarely worth the ~2x xhigh cost

Set effort per prompt, not per task or session.

Model routing in CLAUDE.md:

Sonnet session: cheaper, good when work is in-scope for Sonnet
Opus session + delegate: pay for Opus on parent (planning, tradeoffs), delegate rest

4. Wrong input format

Some inputs are token-expensive by default:

Expensive	Cheaper replacement
Screenshots and Chrome scraping	agent-browser (accessibility tree, ~82% fewer tokens)
PDFs via Read tool	pdftotext (Read loads PDFs as images)
Large repo raw file reads	code-review-graph (persistent AST map, 6.8x fewer tokens on reviews)

Additional tools:

rtk-ai/rtk: CLI proxy that strips redundant whitespace and compresses tool output (~20-30% token reduction)
juliusbrussee/caveman: terse-output skill that drops conversational filler without affecting reasoning

Ronin 的五大上下文陷阱与四层模型路由（2026-05-13）

来源：Ronin — 如何把 AI 编程账单砍掉 80%

Ronin 将月开销从 4,200 美元降到 312 美元，核心方法是识别并消除无效上下文。

五大陷阱

陷阱	机制	成本影响
每轮重发整个代码库	Cursor/Claude Code 自动上下文每次带上 30-50 个未变更文件	50 文件 ≈ 80K token/轮；每天 50 轮 = 60 美元/天
工具调用循环	Agent 每句"我检查一下"都全额重付整个上下文	同一 5 万 token 上下文可能付 5 次
大材小用	改错别字、格式化 JSON 用 Opus	便宜模型完全能胜任
错误的流式/批量选择	流式输出使提示缓存失效；可批量场景用流式	双重浪费
上下文膨胀	不确定要不要的文件全带上	一句修 bug 飙到 80K token

四层模型路由体系

层级	模型	任务类型
高级层	Opus / frontier	关键架构决策、复杂推理
标准层	Sonnet / mid-tier	日常编码、多数实现任务
快速层	Haiku / fast	格式化、重命名、简单重构
批处理层	Batch API	大量同质任务、后台处理

关键原则：不要用一个模型处理所有事。全用 Sonnet 折中方案可能让成本比实际需求高 6 倍。

Counterpoints & Gaps

The exact percentages (2%, 8%, 4%) are approximate and may vary by model version and prompt length.
Some workflows genuinely require MCP-level integration; CLI substitution is not always feature-complete.
There is no documented guarantee that Escape double-tap permanently erases tokens from the billing context.
Ronin 的 80% 降幅基于特定项目结构，不具有普遍代表性。但五大陷阱的定性判断与 Pawel Huryn 的框架一致。

Claude Code Context Window Optimization

Claude Code Context Window Optimization

Overhead breakdown

Cost hierarchy (lowest → highest overhead)

Practical tactics

Monitor usage

Prefer CLI over MCP

Escape double-tap

Pawel Huryn's Four Root Causes (2026-04-26)

1. Cache misses

2. Context bloat

3. Wrong model or effort

4. Wrong input format

Ronin 的五大上下文陷阱与四层模型路由（2026-05-13）

五大陷阱

四层模型路由体系

Counterpoints & Gaps

Sources

Evolution

Derived from source material

Linked from