Multi-Agent Coordination Patterns
Source: Anthropic official blog, translated by 宝玉xp Core principle: Start with the simplest pattern that works, observe where it bottlenecks, then evolve.
Pattern 1: Generator-Verifier
Best for: Output quality is critical and you can write explicit evaluation criteria.
How it works:
- Generator produces a draft
- Verifier checks against defined standards
- If failed, feedback is sent back to generator
- Loop until verifier approves or max iterations reached
Use cases: Code generation (one writes, one runs tests), fact-checking, compliance review, customer support email drafts.
Limitations:
- Quality floor depends entirely on how detailed the verifier's rubric is
- Assumes generation and verification are separable skills (not true for creative breakthroughs)
- Can get stuck in infinite loops if generator cannot resolve verifier's objection
Pattern 2: Orchestrator-Subagent
Best for: Tasks decompose cleanly with clear subtask boundaries.
How it works:
- A central orchestrator plans work, delegates to subagents, and synthesizes final results
- Each subagent works in its own context window, returning distilled results
Real example: Claude Code uses this pattern. The main agent writes code and edits files, but spawns background subagents for large-scale searches or independent investigations.
Use cases: Automated code review (security, coverage, style, architecture checks), complex analysis with distinct dimensions.
Limitations:
- Orchestrator becomes an information bottleneck
- Subagents run sequentially by default unless explicit parallelism is added
- Key details can be lost through repeated summarization
Pattern 3: Agent Teams
Best for: Parallelizable, independent subtasks that need long-running, multi-step execution.
How it works:
- A coordinator spawns teammates as independent processes
- Teammates pull tasks from a shared queue, complete multi-step work, and report back
- Key difference from Orchestrator-Subagent: teammates persist across tasks and accumulate domain context
Use cases: Large codebase migration (each member owns one service module), long-running data pipelines.
Limitations:
- Independence is both a strength and weakness — teammates don't easily share intermediate progress
- Progress management is hard because completion times vary
- Resource contention (multiple agents editing same files/databases) requires conflict resolution
Pattern 4: Message Bus
Best for: Event-driven pipelines and systems that will keep growing new agents.
How it works:
- Agents communicate purely through publish and subscribe
- A router pushes relevant messages to agents based on topics
- New agents can join by subscribing to relevant topics without rewiring the system
Use cases: Security operations (alerts flow from triage → investigation → response), extensible automation pipelines.
Limitations:
- Debugging is painful — tracing a cascade across five agents requires careful log correlation
- Router accuracy is critical; a misclassified or dropped message causes silent failure
- LLM-based routers add their own failure modes (hallucination, misinterpretation)
Pattern 5: Shared State
Best for: Highly collaborative tasks where agents need to build on each other's discoveries in real time.
How it works:
- No central commander. Agents read from and write to a persistent shared store (database, filesystem, document)
- Like a shared blackboard: agents pick up clues, work on them, and write findings back
- System stops when time runs out, convergence threshold is reached, or a designated arbiter declares the answer good enough
Use cases: Cross-domain research (papers, patents, news, reports — each agent's finding becomes another's lead).
Limitations:
- Without central control, agents may duplicate work or diverge
- Reactive loops: Agent A writes, Agent B responds, Agent A responds back... burning tokens without conclusion
- Must design hard stop conditions (time budget, convergence threshold, arbiter agent)
Emerging direction: Latent-space coordination (RecursiveMAS)
来源:AI 简报 2026-05-02 Morning、AINews 2026-05-02
上面五种 pattern 都假设 Agent 之间通过自然语言文本交换中间结果。这一假设带来一个根本开销:每次交接,模型 A 的内部计算结果(数值向量)都要被解码成 token、再被模型 B 编码成数值向量——重复的"翻译/读懂"循环吃掉大量算力和延迟。
RecursiveMAS(@vista8 / @omarsar0 转引论文)尝试绕开这层翻译:
- 机制:Agent 之间直接传模型内部的潜空间向量,而非自然语言文本;只有最后一轮才解码成文本输出
- 训练成本:底层模型完全不动,只训练连接模块(lightweight,比 LoRA 还轻)
- 报告 gains(9 个基准平均):
- 准确率:+8.3%
- 端到端速度:1.2x–2.4x
- Token 减少:34.6%–75.6%
- AIME 数学竞赛:比最强基线高 13-18 个百分点,且递归轮次越多优势越大
为什么对 multi-agent 设计重要:
- 如果 Agent 间通信成本(token 经济、串行延迟)成为主导瓶颈——而 Anthropic Long-Running 与 Cursor planners/workers/judges 的实践都指出它会——latent-space 协调可能是这条线最值得跟踪的研究方向
- 但有一个明显代价:牺牲可观测性。现有 5 种 pattern 都可以靠 transcript 调试,潜空间向量传递则是黑盒,需要新的可观测层
- 它可能首先在 Generator-Verifier 内部循环里落地(最频繁的小幅交互、可观测性需求最低),再向 Orchestrator-Subagent 扩展
张力:RecursiveMAS 的方向与 Agent Memory vs Context Substrate 那种"把工作产物外化到 substrate 上让 Agent 协作"恰好相反——前者把所有协作收回模型内部,后者把所有协作 push 到外部基底。两者各有用武之地,最终大概率是混合架构。
How to choose
| Question | Choose |
|---|---|
| Need explicit quality gate? | Generator-Verifier |
| Short, well-defined subtasks? | Orchestrator-Subagent |
| Long-running independent tasks? | Agent Teams |
| Event-driven, ever-growing pipeline? | Message Bus |
| Need real-time collaboration on shared discoveries? | Shared State |
| Must eliminate single point of failure? | Shared State |
Orchestrator-Subagent vs Agent Teams
Ask: Do workers need to remember state across multiple tasks?
- No → Orchestrator-Subagent
- Yes → Agent Teams
Orchestrator-Subagent vs Message Bus
Ask: Is the workflow predictable?
- Yes → Orchestrator-Subagent
- No (event-driven, may change direction) → Message Bus
Agent Teams vs Shared State
Ask: Do agents need to reference each other's work in progress?
- No → Agent Teams
- Yes → Shared State
Message Bus vs Shared State
Ask: Is the goal processing a stream of events, or accumulating a knowledge base?
- Stream of events → Message Bus
- Accumulating knowledge → Shared State
Real-world architectures
These patterns are building blocks, not mutually exclusive. Common hybrids:
- Orchestrator-Subagent for the main workflow, with a Shared State subtask for collaborative research
- Message Bus for event distribution, with Agent Teams handling each event type
Real-world case: Cursor planners / workers / judges (2026-05-02)
来源:Addy Osmani — Long-running Agents
Cursor 在 "Scaling long-running autonomous coding" 中公开了三次架构迭代:
| 迭代 | 设计 | 问题 |
|---|---|---|
| v1 | 平等地位 agent 写共享文件 + 锁 | 瓶颈,agent 风险厌恶,churn |
| v2 | 乐观并发控制替代锁 | 移除瓶颈,未解决协调问题 |
| v3 | Planners + Workers + Judges | 当前生产架构 |
v3 角色定义:
- Planners — 持续探索代码库,发出任务,可递归 spawn sub-planners
- Workers — 专注执行,不互相协调,不关心大局
- Judges — 决定迭代何时完成、何时重启
关键发现:
- "a surprising amount of the system's behavior comes down to how we prompt the agents" — 比 harness 或模型本身更重要
- 不同模型适合不同角色:GPT 模型比 Opus 更适合 extended autonomous work,因为 Opus 倾向于提前停止和走捷径
Cursor 3 配套:
- Composer 2(自研 frontier coding model)
- Background cloud agents — 8 小时重构和全代码库迁移,笔记本合盖不中断
- 每个 agent 跑在隔离 git worktree,通过 PR 合并回主分支
- 本地/云端 handoff 是多数团队尚未解决的产品表面
Market demand signal
The Cursor 3 user feedback (431 replies) strongly validates these patterns as real market demand, not academic abstractions. Users explicitly asked for role-based agent teams (planner → implementer → reviewer → QA), task-tree views, and user-as-orchestrator workflows. See Cursor 3 User Feedback for the full breakdown.
Real-world case: Gas City and the 100-agent software factory (2026-05-26)
来源:Inside the 100-agent Software Factory
Gas City is a software-factory experiment for coordinating many coding agents. Its current implementation is not yet a clean product surface, but it adds three useful coordination ideas:
- Dark factory / light factory:let many agents run in the background, then expose only the small set of outputs that need human attention.
- One pet, many cattle:keep one high-context coordinating agent close to the human while treating worker agents as disposable execution capacity.
- Multi-model review:use different models for creation, critique, and review rather than assuming one model should own every role.
Why it matters:Gas City strengthens the distinction between adding more agents and building a control plane. The useful pattern is not “100 agents” by itself; it is task routing, review, conflict management, and deciding what should surface to humans.
Counterpoint:The source explicitly treats Gas City as a glimpse of the future rather than a ready workflow. It should enrich Factory Missions and coordination pages, not become a standalone product recommendation.
Real-world case: 注意力治理——多 Agent 的六个边界(2026-05-28)
@ZeroZ_JQ 提出多 Agent 系统的核心不是"分工"(division of labor),而是"注意力治理"(attention governance)——当多个 agent 同时运行时,真正需要管理的不是谁做什么,而是注意力如何在不同任务、不同 agent 之间分配和切换。
六个注意力边界:
| 边界 | 问题 | 治理策略 |
|---|---|---|
| 任务边界 | 一个 agent 同时处理多少任务? | 单任务专注 vs 多任务并行,需明确切换成本 |
| 时间边界 | agent 运行多久后需要休息/审计? | 长程任务的 checkpoint 和 compaction 节奏 |
| 上下文边界 | agent 能记住多少、遗忘多少? | 显式记忆层 vs 隐式上下文,定期 consolidation |
| 工具边界 | agent 能调用哪些工具、以什么权限? | 工具白名单、动态权限升降级 |
| 协作边界 | 多个 agent 如何共享信息而不冲突? | 共享状态 vs 消息总线,冲突检测与解决 |
| 人机边界 | 什么决策必须 human-in-the-loop? | 审批阈值、异常升级路径、rollback 机制 |
关键洞察:
- 传统多 agent 设计先定义角色(planner、worker、reviewer),但角色是静态的,注意力是动态的
- 当 agent 能自我进化时,预设的角色分工会迅速过时——但注意力边界(不要同时修改同一文件、不要无限运行不审计)是持久的
- 这解释了为什么 Cursor 的 Planner/Worker/Judge 架构有效:不是因为它分配了角色,而是因为它治理了不同 agent 的注意力焦点
Related
- harness-engineering/overview — Harness engineering overview
- harness-engineering/openai-frontier-symphony — OpenAI Frontier multi-agent architecture
- harness-engineering/agent-architecture-unsentimental — Unsentimental lessons on agent architecture
- harness-engineering/notion-ai-agents — Notion's 4-agent setup (Orchestrator-Subagent example)
- product-trends/cursor-3-user-feedback-431 — Cursor 3 user feedback validates multi-agent demand
Real-world case: TMA1 v2 — Cross-Agent Context Injection (2026-05-25)
来源:TMA1 v2 — 让 Coding Agent Loop 真的转起来
TMA1 v2 是一个本地 coding agent 可观测性工具,其核心创新不是观测本身,而是把观测结果闭环注入 agent 的下一轮 prompt。
Cross-Agent Context Sharing via /tma1-peer
TMA1 v2 在 Claude Code 和 Codex 之间实现了直接的上下文互通:
- 在 Codex 中运行
/tma1-peer cc 1即可读取 Claude Code 最近的 session 上下文(tool calls、修改文件、build 结果、异常报告) - 使 review → code 的循环无需人工复制粘贴评审意见
这验证了多 agent 协作中,跨 agent 的上下文共享不应依赖人类中转,而应通过结构化接口直接读取对方的会话状态。
<tma1-context> 自动注入
TMA1 在多个 hook 点(UserPromptSubmit、PostToolUse、SessionStart、Stop、PreCompact)自动注入 <tma1-context> 信息:
- 当前 session 信息(duration、tool calls、tokens)
- build 命令状态和最后一次错误
- 外部文件变化(被 human 或其他 agent 修改)
- 基于规则的异常信息(如 context 超过 100k 建议 compaction)
关键设计:这些信息不是让 agent "看着玩",而是直接提醒 agent 注意上下文变化并给出行动建议。例如 human_modified_during_session 会提示 agent 重新读取文件,不要假设内存中的副本仍然有效。
Build State Attribution: Human vs Agent
当 fsnotify 看到文件变化时,如何判断是"人改的"还是"agent 改的"?
TMA1 的归因策略:
- ±5 秒窗口内查 hook 事件:匹配 file_path 的 Edit/Write/MultiEdit
- 查 Bash 命令的 input:是否包含该文件的 basename(能抓到
mkdir、rm、git checkout等没有 file_path 字段的操作) - 两轮都没命中 → 归 human
原则:"宁可冤枉自己,也别替 agent 背锅。"
这代表了** agent 环境感知中的归因问题**——当多个 agent 和人类在同一个代码库上工作时,准确的变更归因是避免冲突和重复劳动的前提。
Pattern Classification
TMA1 v2 的 cross-agent sharing 属于 Shared State 和 Message Bus 的混合:
- 共享状态:通过 MCP Server 查询项目构建信息、环境信息、其他 agent 的 session 信息
- 消息注入:通过 hooks 在特定时机向 agent 推送上下文更新
与 Pattern 5 (Shared State) 的区别:TMA1 不是让 agent 被动读取共享存储,而是主动在 hook 时机注入结构化上下文,降低了 agent 遗忘检查共享状态的风险。
Real-world case: Vox 的 shared brain 实践(2026-05-13)
来源:Vox — The First Step in Building My AI Native Team
Vox 运行 openclaw 和 hermes 两个 agent 平台,发现同一决策在两个平台中被重复记录但互不可见。她引入 gbrain 作为共享大脑,并提出"先建共享书房,再添 agent"的原则。
关键设计决策:
- 共享大脑只存放团队事实和长期决策,不放敏感数据、原始聊天记录或凭证
- 新 agent 默认只读,只有明确的所有者 agent 才能写入
- 控制平面(健康监控)与知识层分离,避免单点故障
与 Pattern 5 (Shared State) 的对比: Vox 的实践比通用 Shared State 更严格——Shared State 允许任何 agent 读写任何数据,而 Vox 强调边界规则优先于连接。这解决了一个未在 Anthropic 原始 pattern 中讨论的问题:共享存储若无治理,会迅速退化为垃圾堆。
详见 Shared Brain First, Boundaries Second。
Real-world case: Slock.ai 的 7 人 + 40 Agent 组织(2026-05-03)
来源:<a href="/wiki/raw/to-learn/agent动力学:这家公司把自己"运行"在自己的产品上-->-slock的设计哲学和使用经验.md" class="wikilink">Agent动力学 — Slock 设计哲学
Slock 的 Agent 组织演化出自自然需求,而非预先设计。RC 从 1 人 + 1 Agent 开始,逐渐发现需要多个 Agent 并行处理不同任务,最终形成 40 个 Agent 的配比。
角色分化:大量工程师(不细分前后端,"工程师就是工程师")、一个工程师主管(关注其他工程师的进展并给出总结报告)、designer、growth、strategy 等不同角色。
任务认领机制:RC 在工程师频道里发任务,Agent 自行认领(claim)。它们会因为做过某类事情而更倾向于继续做这类事情——一种 emergent 的 specialization。
关于单一全能 Agent vs 多 Agent 分工:
- 理论上,单一全能 Agent 更简洁
- 但实践中,当全能 Agent 生成的 sub-agent team 跑偏时,人想直接纠正某个 sub-agent
- 这种"微操需求"是多 Agent 平台存在的用户心理基础
CLI 作为 Agent 界面的设计原则
Slock 的设计从第一性原理出发:
- CLI 是给 Agent 用的,不是给人用的——设计逻辑完全不同
- 输入要简洁明确,help 信息要给清晰例子
- 输出要能明确反映操作是否成功、返回什么数据
- 尽量输出确定的、静态的、信息密度大的结果
- 对于所有 SaaS,它们都应该以 CLI 形态呈现给 Agent
Real-world case: OpenClaw inbox onboarding(2026-05-17)
OpenClaw 团队为每个新 agent 发送结构化 Day 1 邮件,作为多 agent 自治运营的 onboarding primitive:
- Day 1 邮件内容:role、target、sources、first task、reply format
- 三层信息架构:docs 描述「有什么」、memory 记录「发生了什么」、inbox 定义「今天什么重要」
- 分工:inbox 承载「今天」的任务,brain(知识库)承载长期决策
- 演进:几周后 AI 员工团队可自主看到「今天之外」的工作,人类从协调者退居规则制定者
与现有模式的对比:
- 不同于 Orchestrator-Subagent 的实时委派,inbox 模式是异步、声明式的 coordination
- 新 agent 通过读取静态 onboarding 文档而非与人类对话获得上下文,降低协调成本
- 这对 Agent Teams 和 Shared State 模式都有借鉴:明确的边界规则(docs/memory/inbox 分离)比连接本身更重要