9-Layer Production AI Architecture

来源：@techNmak Twitter thread (1521 likes), 2026-04-12

核心观点

生产级 AI 系统与 demo 的根本区别在于架构的分层设计 — 从单文件 demo 到 9 层生产架构的真实演进。这不是复杂度炫耀，而是规模化后的必然要求。

9 层架构详解

1. services/ — 核心服务层

不是 1 个文件，而是 5 个

RAG pipeline — 检索增强生成流程
Semantic cache — 语义缓存层（大多数生产应用跳过，导致重复消耗 token）
Memory 管理 — 上下文与长期记忆
Query rewriter — 查询重写优化
Router 路由 — 请求分发与负载均衡

2. agents/ — Agent 编排层

Document grader — 文档评分与筛选
Decomposer — 任务分解器
Adaptive router — 自适应路由
Self-correcting — 自我纠错机制

3. api/ — API 接口层

标准化接口定义，版本管理

4. infra/ — 基础设施层

部署、监控、扩展

5. config/ — 配置管理层

环境配置、特性开关

6. evals/ — 评估层（常被忽略但至关重要）

Golden dataset — 黄金数据集（回归测试基准）
Offline eval — 离线评估
Online monitor — 在线监控（生产环境漂移检测）

7. schemas/ — 类型定义层

结构化输出定义，验证规则

8. security/ — 安全层

输入验证、输出过滤、权限控制

9. utils/ — 工具函数层

通用辅助函数

关键洞察

Demo vs Production 的差距

维度	Demo	Production
文件结构	单文件	9 层目录
Prompt 角色	承担所有重担	分层协作
缓存策略	无	Semantic cache 必备
评估	手动验证	自动化 evals 体系
错误处理	忽略	分层防御

Versioned Prompts 的重要性

"Versioned prompts are what separate teams that can actually iterate from those stuck debugging mystery outputs."

社区验证

"Evaluation 层是 demo 与 production 的分水岭"
"这是第一次看到 demo vs production gap 被描述为字面文件系统层级差异"
"单文件 demo 之所以能工作是因为 prompt 承担了所有重担"

如何应用

审查当前项目架构，识别缺少的层级（特别是 evals 和 security）
为 prompt 添加版本控制机制
实现 semantic cache 层减少重复 token 消耗
建立 golden dataset 用于回归测试
设计 outputSchema 类型验证层预防集成 bug

关联

harness-engineering/overview — Harness Engineering 综述
harness-engineering/components-coding-agent — 编程智能体核心组件
harness-engineering/six-self-improvement-paths — Agent 自我改进路径
claude-code/failure-modes-config — 失败模式配置

Sources

AI 简报 2026-04-12 — Tech with Mak thread