Barebones Agentic Engineering
What it is
Barebones Agentic Engineering is the practice of maximizing agent output with minimal tooling and harness complexity. The core claim: you do not need the latest agent frameworks, plugins, or memory systems to do world-class work. A basic CLI (Claude Code or Codex) plus disciplined context control outperforms elaborate setups.
The approach emerged from production experience building agentic factories for signals, infrastructure, and data pipelines — not toy projects. After exhaustive experimentation with every available package and harness, the practitioner returned to a near-minimal configuration and achieved the most ground-breaking results.
Why it matters
Foundation companies move at generational speed. Every new generation of agents becomes more instruction-compliant, which redefines what is "optimal." Elaborate harnesses and plugins lock you into solutions for problems that may not exist in the next agent generation. If a problem is real and a solution is good, the foundation companies will eventually absorb it into the base product — as happened with skills, memory, subagents, and planning.
Key points
- Context is everything: The enemy is context bloat. Agents overwhelmed with unnecessary information perform worse. Give agents only the exact information they need for the current task.
- Separate research from implementation: Do not ask an agent to "build an auth system" — that forces research into implementation context. Instead: research the implementation approach first (in a separate session if needed), then hand the chosen design to a fresh agent session with precise specs.
- Exploit sycophancy, do not fight it: Agents are hard-programmed to please. Use this by biasing them toward desired outcomes rather than expecting neutral analysis. Example: a bug-finder agent scored +1/+5/+10 by severity will exhaustively surface every possible issue; an adversarial agent scored for disproving bugs will aggressively challenge them; a referee agent arbitrates. The intersection is high-fidelity validation.
- Tests and screenshots as completion milestones: Agents know how to start a task but not how to end it. Deterministic tests and visual verification provide clear, non-negotiable endpoints. A TASK_CONTRACT.md can encode the exact verification steps required before session termination.
- Rules and skills iterate in cycles: Start barebones. Add rules when an agent does something wrong. Add skills for specific recipes. Over time, rules and skills contradict each other and cause context bloat. Periodically consolidate and remove contradictions. Performance degrades, then you clean up, then it feels like magic again.
- One session per contract: Long-running 24-hour sessions force unrelated contract contexts into the same session, causing bloat. Better: an orchestration layer creates a new session for each well-specified contract.
Evidence across sources
| Source | Key Claim | Relevance |
|---|---|---|
| How To Be A World-Class Agentic Engineer | "Your enthusiasm is likely doing more harm than good" — less harness, more context discipline | Primary framework |
Open questions
- Does the barebones approach scale to teams where multiple people share the same agent configuration?
- How does the "separate research from implementation" rule apply when the agent itself is the researcher?
- When does the cost of context bloat exceed the cost of session setup overhead?
Prompts for witness
- Which of Jean's current skills or harness assumptions would still be needed if models improved by one generation?
- When has an elaborate setup actually hurt performance compared to a minimal one?
- What would a "skill audit" look like — identifying which skills are still useful vs. context pollution?
Related
- harness-engineering/overview — Harness Engineering Overview
- harness-engineering/skill-engineering-as-algorithm — Skill Engineering as Algorithm Design
- harness-engineering/self-verification-loops — Self-Verification Loops
- claude-code/what-makes-good-agents-md — What Makes a Good AGENTS.md