Skip to content
Back/Claude Code

Claude Code Context Window Optimization

View in Graph
Updated 2026-05-13
4 min read
812 words

Claude Code Context Window Optimization

Guide by @aakashgupta: Context is the scarcest resource in Claude Code. Before the first user message is sent, 10%–16% of the context window is already consumed by system prompts, MCP servers, and custom agents.

Overhead breakdown

Component Approximate overhead
System prompt ~2%
Each MCP server ~8%+
Custom agent ~4%
Conversation history grows continuously

Cost hierarchy (lowest → highest overhead)

  1. CLI — zero overhead (e.g., GitHub CLI, Vercel CLI, Firecrawl CLI)
  2. API — moderate overhead
  3. MCP — highest overhead

Andrej Karpathy independently confirmed this CLI > API > MCP ordering.

Practical tactics

Monitor usage

  • Run /status line to see real-time context consumption.
  • Set color-coded thresholds:
    • Green: < 50%
    • Orange: 50–80%
    • Red: > 80%

Prefer CLI over MCP

  • Before adding an MCP server, ask whether a CLI tool can achieve the same result.
  • Periodically audit existing MCP integrations for CLI replacements.

Escape double-tap

  • If Claude starts drifting off-topic, press Escape twice.
  • This rolls back to before the erroneous prompt and physically removes the subsequent content from context.

Pawel Huryn's Four Root Causes (2026-04-26)

Pawel Huryn documented the four root causes of excessive Claude Code token consumption after Anthropic fixed three platform bugs (v2.1.116+):

1. Cache misses

The prompt cache is the single biggest lever. Cache read costs 0.1x input price; cache write costs 1.25x (5-min TTL) or 2x (1-hour TTL). Every hit on a cached prefix resets its TTL at no extra cost.

Key rules:

  • Lock tools at session start. Adding or removing a tool mid-session invalidates the cached prefix.
  • Lock the model at session start. Switching models mid-session blows the cache.
  • ~90% hit rate is healthy on the 5-minute default. On 1-hour TTL (API-only), it climbs to ~97-99%.

2. Context bloat

For Opus 4.7, 1M context is the default but expensive. Long sessions sprawl and auto-compact fires later than optimal.

Recommended settings:

{
  "env": {
    "CLAUDE_CODE_DISABLE_1M_CONTEXT": "1",
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "80"
  }
}

This disables 1M context (fallback to 200K) and pins auto-compact at 80%.

Session moves:

  • /compact at 50% or after every task. Do not wait for auto.
  • /clear between unrelated work. New session = fresh prefix.
  • /rewind when a turn went sideways. Cheaper than re-prompting around bad context.
  • Subagents for anything that does not need the parent's reasoning.

3. Wrong model or effort

Three separate dials that burn tokens fast if misconfigured:

Effort level Use case
low quick fixes, mechanical tasks
medium most prompts (huge savings vs default)
high demanding reasoning
xhigh default for agentic coding (Opus 4.7)
max diminishing returns; rarely worth the ~2x xhigh cost

Set effort per prompt, not per task or session.

Model routing in CLAUDE.md:

  • Sonnet session: cheaper, good when work is in-scope for Sonnet
  • Opus session + delegate: pay for Opus on parent (planning, tradeoffs), delegate rest

4. Wrong input format

Some inputs are token-expensive by default:

Expensive Cheaper replacement
Screenshots and Chrome scraping agent-browser (accessibility tree, ~82% fewer tokens)
PDFs via Read tool pdftotext (Read loads PDFs as images)
Large repo raw file reads code-review-graph (persistent AST map, 6.8x fewer tokens on reviews)

Additional tools:

  • rtk-ai/rtk: CLI proxy that strips redundant whitespace and compresses tool output (~20-30% token reduction)
  • juliusbrussee/caveman: terse-output skill that drops conversational filler without affecting reasoning

Ronin 的五大上下文陷阱与四层模型路由(2026-05-13)

来源:Ronin — 如何把 AI 编程账单砍掉 80%

Ronin 将月开销从 4,200 美元降到 312 美元,核心方法是识别并消除无效上下文。

五大陷阱

陷阱 机制 成本影响
每轮重发整个代码库 Cursor/Claude Code 自动上下文每次带上 30-50 个未变更文件 50 文件 ≈ 80K token/轮;每天 50 轮 = 60 美元/天
工具调用循环 Agent 每句"我检查一下"都全额重付整个上下文 同一 5 万 token 上下文可能付 5 次
大材小用 改错别字、格式化 JSON 用 Opus 便宜模型完全能胜任
错误的流式/批量选择 流式输出使提示缓存失效;可批量场景用流式 双重浪费
上下文膨胀 不确定要不要的文件全带上 一句修 bug 飙到 80K token

四层模型路由体系

层级 模型 任务类型
高级层 Opus / frontier 关键架构决策、复杂推理
标准层 Sonnet / mid-tier 日常编码、多数实现任务
快速层 Haiku / fast 格式化、重命名、简单重构
批处理层 Batch API 大量同质任务、后台处理

关键原则:不要用一个模型处理所有事。全用 Sonnet 折中方案可能让成本比实际需求高 6 倍。

Counterpoints & Gaps

  • The exact percentages (2%, 8%, 4%) are approximate and may vary by model version and prompt length.
  • Some workflows genuinely require MCP-level integration; CLI substitution is not always feature-complete.
  • There is no documented guarantee that Escape double-tap permanently erases tokens from the billing context.
  • Ronin 的 80% 降幅基于特定项目结构,不具有普遍代表性。但五大陷阱的定性判断与 Pawel Huryn 的框架一致。

Sources

Synthesized from 3 sources
  • AI 简报 2026-04-13Supporting source listed by this page.Whole pagemediumbody
  • Claude Code's Limits Are Generous. The Problem Is Your Harness.Supporting source listed by this page.Whole pagemediumbody
  • Ronin — 如何把 AI 编程账单砍掉 80%Supporting source listed by this page.Whole pagemediumbody

Evolution

1 event
  1. absorbed

    Derived from source material

    This page is currently synthesized from 3 sources.

    From AI 简报 2026-04-13, Claude Code's Limits Are Generous. The Problem Is Your Harness., Ronin — 如何把 AI 编程账单砍掉 80%To Claude Code Context Window Optimization
    Sources: raw/briefing/AI Briefing/2026-04-13.md · raw/to-learn/Claude Code's Limits Are Generous. The Problem Is Your Harness..md · raw/to-learn/90% 的AI编程费用都白花了!技术大神直接砍掉80%账单!开发者:真正烧钱的不是模型,而是无效上下文.md

Linked from