Fat Skills, Fat Code, Thin Harness

Principle by Garry Tan (Y Combinator President & CEO): The secret to 10–100x productivity gains with AI is not the model, but what wraps it. Push fuzzy judgment into fat markdown skills, push perfect execution into fat deterministic code, and keep the harness thin.

The core principle

Layer	Responsibility	Rule
Skills (markdown)	Fuzzy, intelligence-driven operations that humans do	Push these into "fat skills"
Code	Deterministic operations that must execute perfectly	Push these into "fat code"
Harness	Orchestration and glue between skills and code	Keep this thin

"Do the right thing at the right layer. Everything else is architecture astronomy."

Garry Tan's five definitions

1. Skill files

A skill file is a reusable markdown document that teaches the AI how to think, not what to do.

Like a method call, it takes parameters
The skill describes the judgment process; the parameters describe the world
Example: /investigate — 7 fixed steps (scope → timeline → analyze → synthesize → pros/cons → cite). With parameters TARGET, QUESTION, DATASET, the same skill becomes a medical research analyst, a forensic investigator, or a compliance auditor.

"This is not prompt engineering. This is software design — using Markdown as a programming language and human judgment as the runtime."

2. Harness

The harness is the program that runs the LLM. It does only four things:

Run the model in a loop
Read and write files
Manage context
Enforce safety checks

Anti-pattern: a bloated harness with 40+ tool definitions consuming half the context window, REST-wrapping every interface (3× tokens, 3× latency, 3× failure rate).

Right pattern: specialized, fast, focused tools.

Playwright CLI: 100ms per browser action
Chrome MCP: 15s for screenshot-find-click-wait-read
That's a 75× difference.

3. Resolver

The resolver is the context routing table. When task type is X, automatically load document Y.

Without a resolver: changing a prompt dumps more text into an ever-growing context pile. With a resolver: the model knows to read docs/EVALS.md before running evaluations, without the user needing to remember it.

Claude Code's built-in resolver matches user intent to skill descriptions automatically. Descriptions are the resolver.

Garry's own fix: his CLAUDE.md had ballooned to 20,000 lines, destroying model attention. The fix was ~200 lines of pointers. The resolver loads the right document on demand.

4. Latent space vs Deterministic space

Mixing these is "the most common error in agent design."

Latent space	Deterministic space
Intelligence: reading, interpreting, deciding, judging, synthesizing, pattern recognition	Trust: same input, same output
Good for seating 8 people with personality dynamics	Bad for seating 800 people (needs mathematical exactness)

Best systems:

Thinking / judgment / synthesis → latent space
Precise computation / reliable execution → deterministic tools

5. Diarization

Diarization is what makes AI applicable to real knowledge work.

Process: model reads all relevant documents on a topic, then outputs a structured one-page brief of distilled judgment — not a罗列, but an analysis that notes contradictions, temporal changes, hidden patterns, and insightful conclusions.

Database query → data and facts
Diarization → judgment after deep thinking

This is the difference between a SQL query and an analyst brief.

Three-layer architecture

Top:    Fat skills    — judgment, process, domain knowledge (90% of value)
Mid:    Thin CLI harness — ~200 lines. JSON in, text out, read-only default
Bottom: Applications  — QueryDB, ReadDoc, Search, Timeline (deterministic base)

Directional principle: push intelligence into skills, push execution into deterministic tools, keep the framework light.

When you do this, every model improvement automatically upgrades all skills, while the deterministic layer stays perfectly reliable.

Case study: Chase Center (YC Startup School)

Problem: 6,000 founders. Traditional 15-person team reading applications breaks at this scale.

Solution: AI-native matching system using the five definitions.

Enrichment (`/enrich-founder`)

Integrates all data sources, performs value-add analysis, event segmentation, and surfaces gaps between what founders say and what they build.

Deterministic layer handles SQL, GitHub stats, browser tests, social scraping, CrustData queries.

Diarization captures what keyword search cannot:

"Maria Santos, Contrail — 'Datadog for AI agents' — but 80% of commits are on billing. She is building a FinOps tool disguised as observability."

Matching

Same skill, three different calls:

/match-breakout — 1,200 founders, industry clustering, 30 per room (embed + deterministic assignment)
/match-lunch — 600 founders, cross-industry serendipity, 8 per table, no repeats (LLM invents themes, algorithm assigns)
/match-live — people currently in building, nearest-neighbor embed, 200ms, 1:1 pairings, exclude past matches

Judgments clustering algorithms cannot make:

"Santos and Oram are both AI infrastructure, but not competitors — Santos is cost attribution, Oram is orchestration. Put them together."

Learning loop (`/improve-skill`)

After the event, reads NPS surveys, focuses on "OK" responses (not bad, just almost worked), extracts patterns, and writes new rules directly back into the skill file:

"When attendee says 'AI infrastructure' but startup is 80%+ billing code → classify as FinTech." "When two attendees already know each other → penalize proximity, prioritize new introductions."

Result: 12% "OK" ratings → 4% next event. The skill file learned what "OK" meant. No human rewrote code.

"If you want to know the most valuable loop of 2026, it's this."

Skills as permanent upgrades

Garry Tan's OpenClaw rule:

"You are not allowed to do one-off work. If you ask me to do something that should happen again: manually do it on 3–10 projects, show me output, if approved, encode it as a skill file. If it should run automatically, put it on cron. Test: if I have to ask you twice, you fail."

Every skill is a permanent upgrade:

Never degrades
Never forgets
Runs at 3 AM while you sleep
Gets better immediately when the next model drops

This is how you achieve Steve Yegge's claimed 100× productivity gain — not smarter models, but "thin harness + fat skills" plus the discipline to encode everything.

Counterpoints & Gaps

The boundary between "fuzzy" and "deterministic" is not always crisp; some tasks sit in a gray zone.
"Thin harness" can become too thin, losing observability, error handling, or security boundaries.
This is a high-level heuristic, not a quantitative prescription; team judgment is still required.
The Chase Center example assumes clean data and well-defined categories; real-world founder data is noisier.

harness-engineering/overview — Harness engineering overview
harness-engineering/your-harness-your-memory — Harness and memory coupling
harness-engineering/openai-frontier-symphony — OpenAI Frontier's ghost repo
harness-engineering/llm-training-pipeline-2026 — Meta-Harness: optimizing harness code itself
claude-code/skills-collection — Claude Code skills

Fat Skills, Fat Code, Thin Harness

Fat Skills, Fat Code, Thin Harness

The core principle

Garry Tan's five definitions

1. Skill files

2. Harness

3. Resolver

4. Latent space vs Deterministic space

5. Diarization

Three-layer architecture

Case study: Chase Center (YC Startup School)

Enrichment (`/enrich-founder`)

Matching

Learning loop (`/improve-skill`)

Skills as permanent upgrades

Counterpoints & Gaps

Sources

Evolution

Derived from source material

Linked from

Fat Skills, Fat Code, Thin Harness

The core principle

Garry Tan's five definitions

1. Skill files

2. Harness

3. Resolver

4. Latent space vs Deterministic space

5. Diarization

Three-layer architecture

Case study: Chase Center (YC Startup School)

Enrichment (/enrich-founder)

Matching

Learning loop (/improve-skill)

Skills as permanent upgrades

Counterpoints & Gaps

Related

Sources

Evolution

Derived from source material

Linked from

Enrichment (`/enrich-founder`)

Learning loop (`/improve-skill`)