Back/harness engineering

Fat Skills, Fat Code, Thin Harness

Updated 2026-04-13
5 min read
1,160 words

Fat Skills, Fat Code, Thin Harness

Principle by Garry Tan (Y Combinator President & CEO): The secret to 10–100x productivity gains with AI is not the model, but what wraps it. Push fuzzy judgment into fat markdown skills, push perfect execution into fat deterministic code, and keep the harness thin.

The core principle

Layer Responsibility Rule
Skills (markdown) Fuzzy, intelligence-driven operations that humans do Push these into "fat skills"
Code Deterministic operations that must execute perfectly Push these into "fat code"
Harness Orchestration and glue between skills and code Keep this thin

"Do the right thing at the right layer. Everything else is architecture astronomy."

Garry Tan's five definitions

1. Skill files

A skill file is a reusable markdown document that teaches the AI how to think, not what to do.

  • Like a method call, it takes parameters
  • The skill describes the judgment process; the parameters describe the world
  • Example: /investigate — 7 fixed steps (scope → timeline → analyze → synthesize → pros/cons → cite). With parameters TARGET, QUESTION, DATASET, the same skill becomes a medical research analyst, a forensic investigator, or a compliance auditor.

"This is not prompt engineering. This is software design — using Markdown as a programming language and human judgment as the runtime."

2. Harness

The harness is the program that runs the LLM. It does only four things:

  1. Run the model in a loop
  2. Read and write files
  3. Manage context
  4. Enforce safety checks

Anti-pattern: a bloated harness with 40+ tool definitions consuming half the context window, REST-wrapping every interface (3× tokens, 3× latency, 3× failure rate).

Right pattern: specialized, fast, focused tools.

  • Playwright CLI: 100ms per browser action
  • Chrome MCP: 15s for screenshot-find-click-wait-read
  • That's a 75× difference.

3. Resolver

The resolver is the context routing table. When task type is X, automatically load document Y.

Without a resolver: changing a prompt dumps more text into an ever-growing context pile. With a resolver: the model knows to read docs/EVALS.md before running evaluations, without the user needing to remember it.

Claude Code's built-in resolver matches user intent to skill descriptions automatically. Descriptions are the resolver.

Garry's own fix: his CLAUDE.md had ballooned to 20,000 lines, destroying model attention. The fix was ~200 lines of pointers. The resolver loads the right document on demand.

4. Latent space vs Deterministic space

Mixing these is "the most common error in agent design."

Latent space Deterministic space
Intelligence: reading, interpreting, deciding, judging, synthesizing, pattern recognition Trust: same input, same output
Good for seating 8 people with personality dynamics Bad for seating 800 people (needs mathematical exactness)

Best systems:

  • Thinking / judgment / synthesis → latent space
  • Precise computation / reliable execution → deterministic tools

5. Diarization

Diarization is what makes AI applicable to real knowledge work.

Process: model reads all relevant documents on a topic, then outputs a structured one-page brief of distilled judgment — not a罗列, but an analysis that notes contradictions, temporal changes, hidden patterns, and insightful conclusions.

  • Database query → data and facts
  • Diarization → judgment after deep thinking

This is the difference between a SQL query and an analyst brief.

Three-layer architecture

Top:    Fat skills    — judgment, process, domain knowledge (90% of value)
Mid:    Thin CLI harness — ~200 lines. JSON in, text out, read-only default
Bottom: Applications  — QueryDB, ReadDoc, Search, Timeline (deterministic base)

Directional principle: push intelligence into skills, push execution into deterministic tools, keep the framework light.

When you do this, every model improvement automatically upgrades all skills, while the deterministic layer stays perfectly reliable.

Case study: Chase Center (YC Startup School)

Problem: 6,000 founders. Traditional 15-person team reading applications breaks at this scale.

Solution: AI-native matching system using the five definitions.

Enrichment (/enrich-founder)

Integrates all data sources, performs value-add analysis, event segmentation, and surfaces gaps between what founders say and what they build.

Deterministic layer handles SQL, GitHub stats, browser tests, social scraping, CrustData queries.

Diarization captures what keyword search cannot:

"Maria Santos, Contrail — 'Datadog for AI agents' — but 80% of commits are on billing. She is building a FinOps tool disguised as observability."

Matching

Same skill, three different calls:

  • /match-breakout — 1,200 founders, industry clustering, 30 per room (embed + deterministic assignment)
  • /match-lunch — 600 founders, cross-industry serendipity, 8 per table, no repeats (LLM invents themes, algorithm assigns)
  • /match-live — people currently in building, nearest-neighbor embed, 200ms, 1:1 pairings, exclude past matches

Judgments clustering algorithms cannot make:

"Santos and Oram are both AI infrastructure, but not competitors — Santos is cost attribution, Oram is orchestration. Put them together."

Learning loop (/improve-skill)

After the event, reads NPS surveys, focuses on "OK" responses (not bad, just almost worked), extracts patterns, and writes new rules directly back into the skill file:

"When attendee says 'AI infrastructure' but startup is 80%+ billing code → classify as FinTech." "When two attendees already know each other → penalize proximity, prioritize new introductions."

Result: 12% "OK" ratings → 4% next event. The skill file learned what "OK" meant. No human rewrote code.

"If you want to know the most valuable loop of 2026, it's this."

Skills as permanent upgrades

Garry Tan's OpenClaw rule:

"You are not allowed to do one-off work. If you ask me to do something that should happen again: manually do it on 3–10 projects, show me output, if approved, encode it as a skill file. If it should run automatically, put it on cron. Test: if I have to ask you twice, you fail."

Every skill is a permanent upgrade:

  • Never degrades
  • Never forgets
  • Runs at 3 AM while you sleep
  • Gets better immediately when the next model drops

This is how you achieve Steve Yegge's claimed 100× productivity gain — not smarter models, but "thin harness + fat skills" plus the discipline to encode everything.

Counterpoints & Gaps

  • The boundary between "fuzzy" and "deterministic" is not always crisp; some tasks sit in a gray zone.
  • "Thin harness" can become too thin, losing observability, error handling, or security boundaries.
  • This is a high-level heuristic, not a quantitative prescription; team judgment is still required.
  • The Chase Center example assumes clean data and well-defined categories; real-world founder data is noisier.

Sources

Linked from