Production Memory Systems

Deep comparison of how three production coding agents implement memory: Hermes (Nous Research), Codex CLI (OpenAI), and Claude Code (Anthropic).

来源：Agent Memory Engineering

完整分析见 Agent Memory Engineering（Nic）——包含 model-harness fit、prefix cache 问题、signal gate、验证纪律、Day 1 bootstrap、跨项目作用域、防注入等章节的详细论述。

Model-Harness-Fit

核心发现：记忆不是可移植的文件层，而是模型与 Harness 熔合后的产物。

Claude 针对 Claude Code 的记忆层 post-train：typed file taxonomy、always-loaded MEMORY.md index、age-aware <system-reminder> framing。GPT-5 针对 Codex 的记忆层 post-train：always-loaded memory_summary.md、on-demand grep into MEMORY.md、<oai-mem-citation> block format。

这意味着：把 64 条 Claude Code 记忆文件丢进 Codex 文件夹，bytes 落地但行为不同。模型不知道用同样的纪律读取、验证、引用它们。

Three Architectures

维度	Hermes	Codex CLI	Claude Code
写入时机	同步，live turn	异步，6h idle 后两阶段	同步，live turn，用户可见
写入者	live agent	Phase 1: gpt-5.4-mini 提取；Phase 2: gpt-5.4 整合	live agent
系统提示加载	~2,200 char 固定快照，会话内不变	仅 5K token summary 常驻；全手册 on-demand grep	200 行 index 常驻；body on-demand Read
存储格式	§ 分隔纯文本，无 schema	严格 YAML frontmatter + block schema	typed file taxonomy + loose YAML frontmatter
防噪音机制	硬字符上限	显式 signal gate：默认 no-op	prompt discipline + grep 去重
防过时机制	无	30 天 usage decay	每次读取动态 age reminder
安全防御	regex threat pattern 扫描	Phase 1 不 follow rollout 内指令 + 双重 secret redaction	依赖验证纪律，无显式扫描

Storage Layer

Hermes：§ (U+00A7) 作为 in-band 记录分隔符，几乎不出现在用户文本中。MEMORY.md + USER.md 两个扁平文件，无 header、无 JSON envelope。

Codex：additionalProperties: false 的严格 JSON schema。consolidation prompt 达 841 行，大量用于教模型维护 schema。

Claude Code：一记忆一文件，按 type 前缀命名（feedback_、project_、reference_、user_）。encoded cwd 路径作为多租户 key，无显式 validator，靠 prompt convention 维持。

Prefix Cache Problem

这是记忆系统设计的硬经济约束。KV cache hit 要求 turn 间前缀 byte-for-byte 相等。若系统提示每轮变化，22K tokens 的系统提示每轮按全价计费（~~$3/M tokens）而非缓存价（~~$0.30/M tokens）。10 倍成本差距。

三系统的共同解法：per-turn 动态数据放入 user message 或 tool call，绝不注入系统提示。

Hermes：冻结 session 启动快照，新写入仅在下个 session 可见
Codex：memory_summary.md 只在 consolidation 完成后更新
Claude Code：auto memory block 在会话内 byte-stable

Signal Gate

Codex 的 Phase 1 提取 prompt 以显式 no-op 为默认：

"If nothing happened in this rollout that future agents should know about, emit empty strings."

空输出记录为 succeeded_no_output，清除 watermark 不再重试。核心原则：让模型证明值得记住，奖励空输出。

Claude Code 用 prompt discipline：告知模型不要保存一次性修正、不要保存 codebase 已显见的事实、不要保存可能翻转的陈述。Hermes 用硬字符上限强制模型自行 triage。

Verification Discipline

Claude Code 每次读取记忆 body 时注入动态 freshness reminder：

"Records can become stale over time... Before answering... verify that the memory is still correct."

这使得记忆成为 hint surface 而非 authority surface。引用文件路径、函数名、系统状态的记忆，agent 必须在断言前 grep 验证。成本是每个 body read 多一次 tool call，收益是避免在演进 codebases 上静默回归旧行为。

Cross-project Scoping

三系统均未实现理想的分层设计：

Hermes：profiles 完全隔离，无全局 overlay
Codex：单一全局文件夹，靠 applies_to: cwd=<path> 标注，存在跨项目泄漏风险
Claude Code：encoded cwd 路径隔离，无继承或 fallback 机制

理想形态应为 _global/ 规则层 + <project>/ 项目层，项目层冲突时优先。

Day 1 Bootstrap

三系统均未解决冷启动问题。新用户打开 agent 时记忆目录为空，前 10 个会话 agent 仍在学习。Codex 的 INIT phase 需要真实 prior sessions 才能提取；Claude Code 靠 CLAUDE.md 承载静态"who am I"上下文作为种子。

开放问题：如何从用户已有的数字足迹（云盘、邮件、日历、聊天记录）中 seed 初始记忆。

Production Memory Systems

Production Memory Systems

Model-Harness-Fit

Three Architectures

Storage Layer

Prefix Cache Problem

Signal Gate

Verification Discipline

Cross-project Scoping

Day 1 Bootstrap

Sources

Evolution

Derived from source material

Linked from