Judgment Engineering

What it is

Judgment Engineering is the practice of making human judgment explicit, versionable, and auditable in AI-driven workflows. It arises from the observation that AI tools do not merely automate tasks; they force humans to articulate preferences, constraints, and quality standards that were previously tacit.

The concept was articulated by ethanytzhou in a 2026 analysis drawing on James C. Scott's Seeing Like a State and Charles Goodhart's law. The core claim: when humans write specs, rules, and eval sets for AI, they are not controlling the AI; they are using the AI as a mirror to see their own implicit knowledge for the first time.

Why it matters

As AI generation speed exceeds human review speed, teams face a structural mismatch: the bottleneck shifts from "can we build it?" to "how do we know it's good?" Most organizations have never systematically written down what "good" means. Judgment Engineering addresses this gap by treating taste, constraints, and verification as first-class engineering artifacts.

Key points

Legibility / 显形: AI's need for explicit instructions forces tacit knowledge into written form. Senior engineers must articulate rules they previously communicated only through intuition and code review comments.
Goodhart's law in AI: Any metric used as a target will be gamed. Code coverage, test pass rates, and line-count limits all distort behavior when optimized directly.
The impossible triangle of judgment: Spec completeness, Goodhart resistance, and tacit-knowledge preservation cannot all be maximized simultaneously. Any practical system must choose which two to prioritize.
Three stone tablets: The article proposes three engineering artifacts to operationalize judgment:
1. Acceptance as Code: Put "what done looks like" into version-controlled, machine-readable criteria.
2. Adversarial Review Network: Separate the actor from the evaluator. Use different models, roles, or systems for generation and critique.
3. Taste as Asset: Write "Project Taste" documents—preference statements rather than hard rules—to preserve team-specific aesthetic and judgment in a partially structured form.

Evidence across sources

Source	Key Claim	Relevance
Harness 的尽头不是缰绳，是镜子	AI forces tacit knowledge into explicit text; this is the third major "legibility" movement in history, and it turns the mirror on ourselves	Foundational articulation of the concept

Open questions

Can "taste" be partially preserved through preference statements, or does any structuring inevitably distort it?
How do organizations balance the need for explicit standards with the risk of metric gaming?
What happens when the next generation of AI can infer tacit knowledge without explicit prompting?

Prompts for witness

What implicit rules do I enforce in code review that I've never written down? What would a project.md file for my team look like?
If I had to hand my current project to an AI with no oral history, which three judgments would be most dangerous to omit?

Judgment Engineering

Judgment Engineering

What it is

Why it matters

Key points

Evidence across sources

Open questions

Prompts for witness

Sources

Evolution

Derived from source material

Linked from

Judgment Engineering

What it is

Why it matters

Key points

Evidence across sources

Open questions

Prompts for witness

Related

Sources

Evolution

Derived from source material

Linked from