Skip to content
Back/Harness Engineering

Computer Use Best Practices — Anthropic Official Guide

View in Graph
Updated 2026-05-26
3 min read
693 words

Computer Use Best Practices — Anthropic Official Guide

What it is

Anthropic 官方发布的 Claude Computer Use / Browser Use 生产部署指南,涵盖分辨率缩放、点击精度、思考力度调优、提示注入防御、上下文管理和实验性工具配置。

Why it matters

Computer use agents interact with untrusted content by design. Every screenshot and webpage could contain adversarial instructions. Without deliberate engineering around resolution scaling, context management, and safety, production deployments fail on basic click accuracy or security gaps.

Key points

Resolution and click accuracy

  • Pre-downscale screenshots before sending to API. The most common cause of poor click accuracy is sending native-resolution screenshots that exceed API limits and get silently downscaled.
  • Claude 4.6 limits: max long edge 1568px, max total pixels 1.15MP.
  • Opus 4.7 limits: max long edge 2576px, max total pixels 3.75MP.
  • Recommended default: 1280x720 for 4.6 family; 1080p for Opus 4.7.
  • Coordinate scaling is critical: scale API-returned coordinates back to native screen resolution before executing clicks.
  • Content ordering: place text instruction before the image in the messages array.
  • Model selection: Sonnet 4.6 tends to be more mechanically precise at clicking; Opus 4.7 narrows this gap with higher resolution budget.

Thinking effort tuning

  • Claude 4.6: medium is the sweet spot — close to highest success rate at roughly half the tokens of high. low is surprisingly strong for cost-sensitive workloads.
  • Claude Opus 4.7: high achieves near-maximum success rate while using roughly half the tokens of max.
  • Avoid max effort for computer use on 4.6 models — no accuracy benefit over high while increasing cost. UI tasks are perceptual, not deeply logical.

Prompt injection defense

  • Training-time robustness: RL builds resistance directly into Claude's capabilities.
  • Real-time classifiers: scan content entering context window and flag potential injection attempts.
  • Built-in classifiers run automatically when using official computer_20251124 tool type — zero additional latency, no extra cost.
  • Best practices regardless of classifier use: human-in-the-loop for high-stakes actions, scope agent permissions narrowly, monitor and log all actions, treat all web content as untrusted.

Context management for long-running agents

Screenshots accumulate fast: each consumes roughly 1,000–1,800 tokens. A 200k context window fills in well under 100 screenshots.

Three layers that compose cleanly:

  1. Cache breakpoints: one on stable prefix (system prompt), up to three on most recent tool results. Spreading breakpoints across recent positions gives graceful degradation.
  2. Cache-aware rolling buffer: keep most recent keep_n=3 screenshots; when total exceeds keep_n + interval=25, replace oldest interval screenshots with placeholders in a single pass. Prefix stays byte-identical between prunes.
  3. LLM-based compaction: summarize conversation history before discarding. Critical sections: user instructions (verbatim), task template, constraints, actions taken, errors and fixes, progress tracking, current state, next step.

Server-side compaction (beta): pass custom summarization prompt as instructions parameter in context_management. Set pause_after_compaction to attach most recent messages across events. Mirror server truncation on client to keep views aligned.

Experimental settings

  • Batch tools (computer_batch, browser_batch): execute multiple sub-actions in single tool call. Use when sub-actions are self-contained and don't depend on each other's visual outcomes. Avoid in exploratory navigation or error-recovery sequences.
  • Advisor tool: pairs executor model with higher-intelligence advisor model for strategic guidance mid-generation. Useful when most turns are mechanical but occasional planning moments need Opus-level reasoning. Cleanup orphaned advisor blocks when disabling the tool.
  • Teach Mode: record human performing a workflow (screenshots, actions, optional voice), then replay as context. Not strict replay — Claude adapts to UI changes. Supports strict, adaptive, and goal-oriented playback modes.

Open questions

  • How do batch tools affect error recovery when one sub-action in a batch misses its target?
  • Does the advisor tool pattern generalize beyond computer use to other long-horizon agent tasks?
  • What is the cost breakpoint where server-side compaction pays for itself vs rolling buffer alone?

Prompts for witness

  • What is the most frequent failure mode in your computer use integration, and which layer (resolution, thinking effort, context management, safety) would fix it?
  • If you had to set a single "taste standard" for agent-generated UI interactions, what would it be?

Sources

Synthesized from 2 sources
  • Anthropic — Best Practices for Computer and Browser Use with Claude (EN)Supporting source listed by this page.Whole pagemediumbody
  • Best practices for computer and browser use with ClaudeSupporting source listed by this page.Whole pagemediumabsorb log

Evolution

1 event
  1. absorbed

    Derived from source material

    This page is currently synthesized from 2 sources.

    From Anthropic — Best Practices for Computer and Browser Use with Claude (EN), Best practices for computer and browser use with ClaudeTo Computer Use Best Practices — Anthropic Official Guide
    Sources: raw/to-learn/best-practices-computer-browser-use-claude.md · raw/to-learn/Best practices for computer and browser use with Claude.md

Linked from