Skill Engineering as Algorithm Design

What it is

Skill engineering is the practice of treating an AI agent not as a conversational partner but as a deterministic algorithm embedded in a controlled execution environment. The core insight: LLMs are probabilistic by nature, but the systems around them can be engineered for determinism, recoverability, and auditability.

This approach emerged from a practical frustration: when all rules are stuffed into a single SKILL.md, compliance degrades as the file grows. At around 200 lines, agents begin skipping steps, confusing field names across APIs, and executing without confirmation. The problem is not model intelligence but attention architecture — LLMs do not uniformly attend to all tokens in a long context.

Why it matters

The "agent as algorithm" framing resolves two production pain points:

Token waste from trial-and-error: An agent exploring a "create permission set" flow might take 5 conversational rounds. A pre-built CLI command reduces this to 1 round.
Path uncertainty: The same prompt can produce different outputs across models or context orderings. In production, this non-determinism is a liability.

The solution is architectural separation: the agent retains its strengths (natural language understanding, judgment, synthesis) while all deterministic operations (flow order, data formats, API calls, state management) are extracted into an external, non-LLM program that the agent cannot bypass.

Key points

CLI as deterministic backbone: Shell commands, scripts, and typed interfaces handle everything that does not require judgment. The agent's job is reduced to parsing intent and emitting structured parameters.
Context minimization: Instead of 200-line rule files, the agent receives only the rules relevant to the current step. This mimics how humans use checklists, not manuals.
Error taxonomy: Production failures fall into predictable categories — YAML indentation errors, field name mismatches, skipped confirmation steps. These are harness problems, not model problems.
Recoverability: When the deterministic layer fails, the system knows exactly where and why. When the LLM layer fails, the fallback is to ask for clarification rather than guess.

Evidence across sources

Source	Key Claim	Relevance
<a href="/wiki/raw/to-learn/当我把-ai-变成一个"算法"：skill-工程化设计的心路历程.md" class="wikilink">腾讯程序员 — Skill 工程化设计	"不改变河的本性，但给它修好渠" — deterministic layer + probabilistic decision engine	Primary framework source
AI 简报 2026-06-07	Self-improving skill loop: examples → trigger description → evals → memory → meta-skill cleanup	Concrete loop implementation that adds feedback and memory to deterministic skill design

Open questions

Where is the precise boundary between "should be deterministic" and "should be left to the model"? This boundary shifts as models improve.
Does excessive determinism reduce the agent's ability to handle edge cases that were not anticipated by the CLI designer?
How does this pattern scale to domains where no clean API or CLI exists?

Prompts for witness

In Jean's current skill setup, which operations are deterministic enough to be extracted into scripts or CLIs?
What would a "skill audit" look like — reviewing existing skills to identify deterministic vs. probabilistic components?
When has an agent failed because the harness was too rigid, not because the model was too weak?

Skill Engineering as Algorithm Design

Skill Engineering as Algorithm Design

What it is

Why it matters

Key points

Evidence across sources

Open questions

Prompts for witness

Sources

Evolution

Derived from source material

Linked from

Skill Engineering as Algorithm Design

What it is

Why it matters

Key points

Evidence across sources

Open questions

Prompts for witness

Related

Sources

Evolution

Derived from source material

Linked from