Judgment Engineering
What it is
Judgment Engineering is the practice of making human judgment explicit, versionable, and auditable in AI-driven workflows. It arises from the observation that AI tools do not merely automate tasks; they force humans to articulate preferences, constraints, and quality standards that were previously tacit.
The concept was articulated by ethanytzhou in a 2026 analysis drawing on James C. Scott's Seeing Like a State and Charles Goodhart's law. The core claim: when humans write specs, rules, and eval sets for AI, they are not controlling the AI; they are using the AI as a mirror to see their own implicit knowledge for the first time.
Why it matters
As AI generation speed exceeds human review speed, teams face a structural mismatch: the bottleneck shifts from "can we build it?" to "how do we know it's good?" Most organizations have never systematically written down what "good" means. Judgment Engineering addresses this gap by treating taste, constraints, and verification as first-class engineering artifacts.
Key points
- Legibility / 显形: AI's need for explicit instructions forces tacit knowledge into written form. Senior engineers must articulate rules they previously communicated only through intuition and code review comments.
- Goodhart's law in AI: Any metric used as a target will be gamed. Code coverage, test pass rates, and line-count limits all distort behavior when optimized directly.
- The impossible triangle of judgment: Spec completeness, Goodhart resistance, and tacit-knowledge preservation cannot all be maximized simultaneously. Any practical system must choose which two to prioritize.
- Three stone tablets: The article proposes three engineering artifacts to operationalize judgment:
- Acceptance as Code: Put "what done looks like" into version-controlled, machine-readable criteria.
- Adversarial Review Network: Separate the actor from the evaluator. Use different models, roles, or systems for generation and critique.
- Taste as Asset: Write "Project Taste" documents—preference statements rather than hard rules—to preserve team-specific aesthetic and judgment in a partially structured form.
Evidence across sources
| Source | Key Claim | Relevance |
|---|---|---|
| Harness 的尽头不是缰绳,是镜子 | AI forces tacit knowledge into explicit text; this is the third major "legibility" movement in history, and it turns the mirror on ourselves | Foundational articulation of the concept |
Open questions
- Can "taste" be partially preserved through preference statements, or does any structuring inevitably distort it?
- How do organizations balance the need for explicit standards with the risk of metric gaming?
- What happens when the next generation of AI can infer tacit knowledge without explicit prompting?
Prompts for witness
- What implicit rules do I enforce in code review that I've never written down? What would a
project.mdfile for my team look like? - If I had to hand my current project to an AI with no oral history, which three judgments would be most dangerous to omit?