{"id":493851,"date":"2026-02-23T09:58:35","date_gmt":"2026-02-23T09:58:35","guid":{"rendered":"https:\/\/www.newsbeep.com\/ca\/493851\/"},"modified":"2026-02-23T09:58:35","modified_gmt":"2026-02-23T09:58:35","slug":"how-to-write-a-good-spec-for-ai-agents-oreilly","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ca\/493851\/","title":{"rendered":"How to Write a Good Spec for AI Agents \u2013 O\u2019Reilly"},"content":{"rendered":"<p>This post first appeared on Addy Osmani\u2019s <a href=\"https:\/\/addyo.substack.com\/p\/how-to-write-a-good-spec-for-ai-agents\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Elevate Substack newsletter<\/a> and is being republished here with the author\u2019s permission.<\/p>\n<p>TL;DR: Aim for a clear spec covering just enough nuance (this may include structure, style, testing, boundaries.\u00a0.\u00a0.) to guide the AI without overwhelming it. Break large tasks into smaller ones versus keeping everything in one large prompt. Plan first in read-only mode, then execute and iterate continuously.<\/p>\n<p>\u201cI\u2019ve heard a lot about writing good specs for AI agents, but haven\u2019t found a solid framework yet. I could write a spec that rivals an RFC, but at some point the context is too large and the model breaks down.\u201d<\/p>\n<p>Many developers share this frustration. Simply throwing a massive spec at an AI agent doesn\u2019t work\u2014context window limits and the model\u2019s \u201cattention budget\u201d get in the way. The key is to write smart specs: documents that guide the agent clearly, stay within practical context sizes, and evolve with the project. This guide distills best practices from my use of coding agents including Claude Code and Gemini CLI into a framework for spec-writing that keeps your AI agents focused and productive.<\/p>\n<p>We\u2019ll cover five principles for great AI agent specs, each starting with a bolded takeaway.<\/p>\n<p>1. Start with a High-Level Vision and Let the AI Draft the Details<\/p>\n<p>Kick off your project with a concise high-level spec, then have the AI expand it into a detailed plan.<\/p>\n<p>Instead of overengineering upfront, begin with a clear goal statement and a few core requirements. Treat this as a \u201cproduct brief\u201d and let the agent generate a more elaborate spec from it. This leverages the AI\u2019s strength in elaboration while you maintain control of the direction. This works well unless you already feel you have very specific technical requirements that must be met from the start.<\/p>\n<p>Why this works: LLM-based agents excel at fleshing out details when given a solid high-level directive, but they need a clear mission to avoid drifting off course. By providing a short outline or objective description and asking the AI to produce a full specification (e.g., a spec.md), you create a persistent reference for the agent. Planning in advance matters even more with an agent: You can iterate on the plan first, then hand it off to the agent to write the code. The spec becomes the first artifact you and the AI build together.<\/p>\n<p>Practical approach: Start a new coding session by prompting\u00a0<\/p>\n<p>You are an AI software engineer. Draft a detailed specification for <br \/>[project X] covering objectives, features, constraints, and a step-by-step plan.<\/p>\n<p>Keep your initial prompt high-level: e.g., \u201cBuild a web app where users can <br \/>track tasks (to-do list), with user accounts, a database, and a simple UI.\u201d<\/p>\n<p>The agent might respond with a structured draft spec: an overview, feature list, tech stack suggestions, data model, and so on. This spec then becomes the \u201csource of truth\u201d that both you and the agent can refer back to. GitHub\u2019s AI team promotes <a href=\"https:\/\/github.blog\/ai-and-ml\/generative-ai\/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">spec-driven development<\/a> where \u201cspecs become the shared source of truth\u2026living, executable artifacts that evolve with the project.\u201d Before writing any code, review and refine the AI\u2019s spec. Make sure it aligns with your vision and correct any hallucinations or off-target details.<\/p>\n<p>Use Plan Mode to enforce planning-first: Tools like Claude Code offer a <a href=\"https:\/\/code.claude.com\/docs\/en\/common-workflows#use-plan-mode-for-safe-code-analysis\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Plan Mode<\/a> that restricts the agent to read-only operations\u2014it can analyze your codebase and create detailed plans but won\u2019t write any code until you\u2019re ready. This is ideal for the planning phase: Start in Plan Mode (Shift+Tab in Claude Code), describe what you want to build, and let the agent draft a spec while exploring your existing code. Ask it to clarify ambiguities by questioning you about the plan. Have it review the plan for architecture, best practices, security risks, and testing strategy. The goal is to refine the plan until there\u2019s no room for misinterpretation. Only then do you exit Plan Mode and let the agent execute. This workflow prevents the common trap of jumping straight into code generation before the spec is solid.<\/p>\n<p>Use the spec as context: Once approved, save this spec (e.g., as SPEC.md) and feed relevant sections into the agent as needed. Many developers using a strong model do exactly this. The spec file persists between sessions, anchoring the AI whenever work resumes on the project. This mitigates the forgetfulness that can happen when the conversation history gets too long or when you have to restart an agent. It\u2019s akin to how one would use a product requirements document (PRD) in a team: a reference that everyone (human or AI) can consult to stay on track. Experienced folks often \u201c<a href=\"https:\/\/simonwillison.net\/2025\/Oct\/7\/vibe-engineering\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">write good documentation first<\/a> and the model may be able to build the matching implementation from that input alone\u201d as one engineer observed. The spec is that documentation.<\/p>\n<p>Keep it goal oriented: A high-level spec for an AI agent should focus on what and why more than the nitty-gritty how (at least initially). Think of it like the user story and acceptance criteria: Who is the user? What do they need? What does success look like? (For example, \u201cUser can add, edit, complete tasks; data is saved persistently; the app is responsive and secure.\u201d) This keeps the AI\u2019s detailed spec grounded in user needs and outcome, not just technical to-dos. As the <a href=\"https:\/\/github.blog\/ai-and-ml\/generative-ai\/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub Spec Kit docs<\/a> put it, provide a high-level description of what you\u2019re building and why, and let the coding agent generate a detailed specification focusing on user experience and success criteria. Starting with this big-picture vision prevents the agent from losing sight of the forest for the trees when it later gets into coding.<\/p>\n<p>2. Structure the Spec Like a Professional PRD (or SRS)<\/p>\n<p>Treat your AI spec as a structured document (PRD) with clear sections, not a loose pile of notes.<\/p>\n<p>Many developers treat specs for agents much like traditional product requirement documents (PRDs) or system design docs: comprehensive, well-organized, and easy for a \u201cliteral-minded\u201d AI to parse. This formal approach gives the agent a blueprint to follow and reduces ambiguity.<\/p>\n<p>The six core areas<\/p>\n<p>GitHub\u2019s analysis of <a href=\"https:\/\/github.blog\/ai-and-ml\/github-copilot\/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">over 2,500 agent configuration files<\/a> revealed a clear pattern: The most effective specs cover six areas. Use this as a checklist for completeness:<\/p>\n<p>Commands: Put executable commands early\u2014not just tool names but full commands with flags: npm test, pytest -v, npm run build. The agent will reference these constantly.Testing: How to run tests, what framework you use, where test files live, and what coverage expectations exist.Project structure: Where source code lives, where tests go, where docs belong. Be explicit: \u201csrc\/ for application code, tests\/ for unit tests, docs\/ for documentation.\u201dCode style: One real code snippet showing your style beats three paragraphs describing it. Include naming conventions, formatting rules, and examples of good output.Git workflow: Branch naming, commit message format, PR requirements. The agent can follow these if you spell them out.Boundaries: What the agent should never touch\u2014secrets, vendor directories, production configs, specific folders. \u201cNever commit secrets\u201d was the single most common helpful constraint in the GitHub study.<\/p>\n<p>Be specific about your stack: Say \u201cReact 18 with TypeScript, Vite, and Tailwind CSS,\u201d not \u201cReact project.\u201d Include versions and key dependencies. Vague specs produce vague code.<\/p>\n<p>Use a consistent format: Clarity is king. Many devs use Markdown headings or even XML-like tags in the spec to delineate sections because AI models handle well-structured text better than free-form prose. For example, you might structure the spec as:<\/p>\n<p># Project Spec: My team&#8217;s tasks app<\/p>\n<p>## Objective<br \/>\n&#8211; Build a web app for small teams to manage tasks&#8230;<\/p>\n<p>## Tech Stack<br \/>\n&#8211; React 18+, TypeScript, Vite, Tailwind CSS<br \/>\n&#8211; Node.js\/Express backend, PostgreSQL, Prisma ORM<\/p>\n<p>## Commands<br \/>\n&#8211; Build: `npm run build` (compiles TypeScript, outputs to dist\/)<br \/>\n&#8211; Test: `npm test` (runs Jest, must pass before commits)<br \/>\n&#8211; Lint: `npm run lint &#8211;fix` (auto-fixes ESLint errors)<\/p>\n<p>## Project Structure<br \/>\n&#8211; `src\/` \u2013 Application source code<br \/>\n&#8211; `tests\/` \u2013 Unit and integration tests<br \/>\n&#8211; `docs\/` \u2013 Documentation<\/p>\n<p>## Boundaries<br \/>\n&#8211; \u2705 Always: Run tests before commits, follow naming conventions<br \/>\n&#8211; \u26a0\ufe0f Ask first: Database schema changes, adding dependencies<br \/>\n&#8211; \ud83d\udeab Never: Commit secrets, edit node_modules\/, modify CI config<\/p>\n<p>This level of organization not only helps you think clearly but also helps the AI find information. Anthropic engineers recommend <a href=\"https:\/\/www.anthropic.com\/engineering\/effective-context-engineering-for-ai-agents\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">organizing prompts into distinct sections<\/a> (like , , ,  etc.) for exactly this reason: It gives the model strong cues about which info is which. And remember, \u201cminimal does not necessarily mean short\u201d\u2014don\u2019t shy away from detail in the spec if it matters, but keep it focused.<\/p>\n<p>Integrate specs into your toolchain: Treat specs as \u201cexecutable artifacts\u201d tied to version control and CI\/CD. The <a href=\"https:\/\/github.blog\/ai-and-ml\/generative-ai\/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub Spec Kit<\/a> uses a four-phase gated workflow that makes your specification the center of your engineering process. Instead of writing a spec and setting it aside, the spec drives the implementation, checklists, and task breakdowns. Your primary role is to steer; the coding agent does the bulk of the writing. Each phase has a specific job, and you don\u2019t move to the next one until the current task is fully validated:<\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" width=\"1600\" height=\"883\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2026\/02\/Current-task-is-validated-1600x883.jpg\" alt=\"Task validation\" class=\"wp-image-18098\"  \/><\/p>\n<p>1. Specify: You provide a high-level description of what you\u2019re building and why, and the coding agent generates a detailed specification. This isn\u2019t about technical stacks or app design\u2014it\u2019s about user journeys, experiences, and what success looks like. Who will use this? What problem does it solve? How will they interact with it? Think of it as mapping the user experience you want to create, and letting the coding agent flesh out the details. This becomes a living artifact that evolves as you learn more.<\/p>\n<p>2. Plan: Now you get technical. You provide your desired stack, architecture, and constraints, and the coding agent generates a comprehensive technical plan. If your company standardizes on certain technologies, this is where you say so. If you\u2019re integrating with legacy systems or have compliance requirements, all of that goes here. You can ask for multiple plan variations to compare approaches. If you make internal docs available, the agent can integrate your architectural patterns directly into the plan.<\/p>\n<p>3. Tasks: The coding agent takes the spec and plan and breaks them into actual work\u2014small, reviewable chunks that each solve a specific piece of the puzzle. Each task should be something you can implement and test in isolation, almost like test-driven development for your AI agent. Instead of \u201cbuild authentication,\u201d you get concrete tasks like \u201ccreate a user registration endpoint that validates email format.\u201d<\/p>\n<p>4. Implement: Your coding agent tackles tasks one by one (or in parallel). Instead of reviewing thousand-line code dumps, you review focused changes that solve specific problems. The agent knows what to build (specification), how to build it (plan), and what to work on (task). Crucially, your role is to verify at each phase: Does the spec capture what you want? Does the plan account for constraints? Are there edge cases the AI missed? The process builds in checkpoints for you to critique, spot gaps, and course-correct before moving forward.<\/p>\n<p>This gated workflow prevents what Willison calls \u201chouse of cards code\u201d: fragile AI outputs that collapse under scrutiny. Anthropic\u2019s Skills system offers a similar pattern, letting you define reusable Markdown-based behaviors that agents invoke. By embedding your spec in these workflows, you ensure the agent can\u2019t proceed until the spec is validated, and changes propagate automatically to task breakdowns and tests.<\/p>\n<p>Consider agents.md for specialized personas: For tools like GitHub Copilot, you can create <a href=\"https:\/\/github.blog\/ai-and-ml\/github-copilot\/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">agents.md files<\/a> that define specialized agent personas\u2014a @docs-agent for technical writing, a @test-agent for QA, a @security-agent for code review. Each file acts as a focused spec for that persona\u2019s behavior, commands, and boundaries. This is particularly useful when you want different agents for different tasks rather than one general-purpose assistant.<\/p>\n<p>Design for agent experience (AX): Just as we design APIs for developer experience (DX), consider designing specs for \u201cagent experience.\u201d This means clean, parseable formats: OpenAPI schemas for any APIs the agent will consume, llms.txt files that summarize documentation for LLM consumption, and explicit type definitions. The Agentic AI Foundation (AAIF) is standardizing protocols like MCP (Model Context Protocol) for tool integration. Specs that follow these patterns are easier for agents to consume and act on reliably.<\/p>\n<p>PRD versus SRS mindset: It helps to borrow from established documentation practices. For AI agent specs, you\u2019ll often blend these into one document (as illustrated above), but covering both angles serves you well. Writing it like a PRD ensures you include user-centric context (\u201cthe why behind each feature\u201d) so the AI doesn\u2019t optimize for the wrong thing. Expanding it like an SRS ensures you nail down the specifics the AI will need to actually generate correct code (like what database or API to use). Developers have found that this extra upfront effort pays off by drastically reducing miscommunications with the agent later.<\/p>\n<p>Make the spec a \u201cliving document\u201d: Don\u2019t write it and forget it. Update the spec as you and the agent make decisions or discover new info. If the AI had to change the data model or you decided to cut a feature, reflect that in the spec so it remains the ground truth. Think of it as version-controlled documentation. In <a href=\"https:\/\/github.blog\/ai-and-ml\/generative-ai\/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">spec-driven workflows<\/a>, the spec drives implementation, tests, and task breakdowns, and you don\u2019t move to coding until the spec is validated. This habit keeps the project coherent, especially if you or the agent step away and come back later. Remember, the spec isn\u2019t just for the AI\u2014it helps you as the developer maintain oversight and ensure the AI\u2019s work meets the real requirements.<\/p>\n<p>3. Break Tasks into Modular Prompts and Context, Not One Big Prompt<\/p>\n<p>Divide and conquer: Give the AI one focused task at a time rather than a monolithic prompt with everything at once.<\/p>\n<p>Experienced AI engineers have learned that trying to stuff the entire project (all requirements, all code, all instructions) into a single prompt or agent message is a recipe for confusion. Not only do you risk hitting token limits; you also risk the model losing focus due to the \u201c<a href=\"https:\/\/maxpool.dev\/research-papers\/curse_of_instructions_report.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">curse of instructions<\/a>\u201d\u2014too many directives causing it to follow none of them well. The solution is to design your spec and workflow in a modular way, tackling one piece at a time and pulling in only the context needed for that piece.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1600\" height=\"883\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2026\/02\/Modular-prompts-1600x883.jpg\" alt=\"Modular prompts\" class=\"wp-image-18099\"  \/><\/p>\n<p>The curse of too much context\/instructions: Research has confirmed what many devs anecdotally saw: as you pile on more instructions or data into the prompt, the model\u2019s performance in adhering to each one <a href=\"https:\/\/openreview.net\/pdf\/848f1332e941771aa491f036f6350af2effe0513.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">drops significantly<\/a>. One study dubbed this the \u201ccurse of instructions\u201d, showing that even GPT-4 and Claude struggle when asked to satisfy many requirements simultaneously. In practical terms, if you present 10 bullet points of detailed rules, the AI might obey the first few and start overlooking others. The better strategy is iterative focus. <a href=\"https:\/\/maxpool.dev\/research-papers\/curse_of_instructions_report.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Guidelines from industry<\/a> suggest decomposing complex requirements into sequential, simple instructions as a best practice. Focus the AI on one subproblem at a time, get that done, then move on. This keeps the quality high and errors manageable.<\/p>\n<p>Divide the spec into phases or components: If your spec document is very long or covers a lot of ground, consider splitting it into parts (either physically separate files or clearly separate sections). For example, you might have a section for \u201cbackend API spec\u201d and another for \u201cfrontend UI spec.\u201d You don\u2019t need to always feed the frontend spec to the AI when it\u2019s working on the backend, and vice versa. Many devs using multi-agent setups even create separate agents or subprocesses for each part (e.g., one agent works on database\/schema, another on API logic, another on frontend\u2014each with the relevant slice of the spec). Even if you use a single agent, you can emulate this by copying only the relevant spec section into the prompt for that task. Avoid context overload: Don\u2019t mix authentication tasks with database schema changes in one go, as the <a href=\"https:\/\/docs.digitalocean.com\/products\/gradient-ai-platform\/concepts\/context-management\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">DigitalOcean AI guide<\/a> warns. Keep each prompt tightly scoped to the current goal.<\/p>\n<p>Extended TOC\/summaries for large specs: One clever technique is to have the agent build an extended table of contents with summaries for the spec. This is essentially a \u201cspec summary\u201d that condenses each section into a few key points or keywords, and references where details can be found. For example, if your full spec has a section on security requirements spanning 500 words, you might have the agent summarize it to: \u201cSecurity: Use HTTPS, protect API keys, implement input validation (see full spec \u00a74.2).\u201d By creating a hierarchical summary in the planning phase, you get a bird\u2019s-eye view that can stay in the prompt, while the fine details remain offloaded unless needed. This extended TOC acts as an index: The agent can consult it and say, \u201cAha, there\u2019s a security section I should look at,\u201d and you can then provide that section on demand. It\u2019s similar to how a human developer skims an outline and then flips to the relevant page of a spec document when working on a specific part.<\/p>\n<p>To implement this, you can prompt the agent after writing the spec: \u201cSummarize the spec above into a very concise outline with each section\u2019s key points and a reference tag.\u201d The result might be a list of sections with one or two sentence summaries. That summary can be kept in the system or assistant message to guide the agent\u2019s focus without eating up too many tokens. This <a href=\"https:\/\/addyo.substack.com\/p\/context-engineering-bringing-engineering\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">hierarchical summarization approach<\/a> is known to help LLMs maintain long-term context by focusing on the high-level structure. The agent carries a \u201cmental map\u201d of the spec.<\/p>\n<p>Utilize subagents or \u201cskills\u201d for different spec parts: Another advanced approach is using multiple specialized agents (what Anthropic calls subagents or what you might call \u201cskills\u201d). Each subagent is configured for a specific area of expertise and given the portion of the spec relevant to that area. For instance, you might have a database designer subagent that only knows about the data model section of the spec, and an API coder subagent that knows the API endpoints spec. The main agent (or an orchestrator) can route tasks to the appropriate subagent automatically.<\/p>\n<p>The benefit is each agent has a smaller context window to deal with and a more focused role, which can <a href=\"https:\/\/10xdevelopers.dev\/structured\/claude-code-with-subagents\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">boost accuracy and allow parallel work<\/a> on independent tasks. Anthropic\u2019s Claude Code supports this by letting you define subagents with their own system prompts and tools. \u201cEach subagent has a specific purpose and expertise area, uses its own context window separate from the main conversation, and has a custom system prompt guiding its behavior,\u201d as their docs describe. When a task comes up that matches a subagent\u2019s domain, Claude can delegate that task to it, with the subagent returning results independently.<\/p>\n<p>Parallel agents for throughput: Running multiple agents simultaneously is emerging as \u201cthe next big thing\u201d for developer productivity. Rather than waiting for one agent to finish before starting another task, you can spin up parallel agents for non-overlapping work. Willison describes this as \u201c<a href=\"https:\/\/simonwillison.net\/2025\/Oct\/7\/vibe-engineering\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">embracing parallel coding agents<\/a>\u201d and notes it\u2019s \u201csurprisingly effective, if mentally exhausting.\u201d The key is scoping tasks so agents don\u2019t step on each other: One agent codes a feature while another writes tests, or separate components get built concurrently. Orchestration frameworks like LangGraph or OpenAI Swarm can help coordinate these agents, and shared memory via vector databases (like Chroma) lets them access common context without redundant prompting.<\/p>\n<p>Single versus multi-agent: When to use eachSingle agent parallelMulti-agentStrengthsSimpler setup; lower overhead; easier to debug and followHigher throughput; handles complex interdependencies; specialists per domainChallengesContext overload on big projects; slower iteration; single point of failureCoordination overhead; potential conflicts; needs shared memory (e.g., vector DBs)Best forIsolated modules; small-to-medium projects; early prototypingLarge codebases; one codes + one tests + one reviews; independent featuresTipsUse spec summaries; refresh context per task; start fresh sessions oftenLimit to 2\u20133 agents initially; use MCP for tool sharing; define clear boundaries<\/p>\n<p>In practice, using subagents or skill-specific prompts might look like: You maintain multiple spec files (or prompt templates)\u2014e.g., SPEC_backend.md, SPEC_frontend.md\u2014and you tell the AI, \u201cFor backend tasks, refer to SPEC_backend; for frontend tasks refer to SPEC_frontend.\u201d Or in a tool like Cursor\/Claude, you actually spin up a subagent for each. This is certainly more complex to set up than a single-agent loop, but it mimics what human developers do: We mentally compartmentalize a large spec into relevant chunks. (You don\u2019t keep the whole 50-page spec in your head at once; you recall the part you need for the task at hand, and have a general sense of the overall architecture.) The challenge, as noted, is managing interdependencies: The subagents must still coordinate. (The frontend needs to know the API contract from the backend spec, etc.) A central overview (or an \u201carchitect\u201d agent) can help by referencing the subspecs and ensuring consistency.<\/p>\n<p>Focus each prompt on one task\/section: Even without fancy multi-agent setups, you can manually enforce modularity. For example, after the spec is written, your next move might be: \u201cStep 1: Implement the database schema.\u201d You feed the agent the database section of the spec only, plus any global constraints from the spec (like tech stack). The agent works on that. Then for Step 2, \u201cNow implement the authentication feature\u201d, you provide the auth section of the spec and maybe the relevant parts of the schema if needed. By refreshing the context for each major task, you ensure the model isn\u2019t carrying a lot of stale or irrelevant information that could distract it. As one guide suggests: \u201c<a href=\"https:\/\/docs.digitalocean.com\/products\/gradient-ai-platform\/concepts\/context-management\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Start fresh: begin new sessions<\/a> to clear context when switching between major features.\u201d You can always remind the agent of critical global rules (from the spec\u2019s constraints section) each time, but don\u2019t shove the entire spec in if it\u2019s not all needed.<\/p>\n<p>Use in-line directives and code TODOs: Another modularity trick is to use your code or spec as an active part of the conversation. For instance, scaffold your code with \/\/ TODO comments that describe what needs to be done, and have the agent fill them one by one. Each TODO essentially acts as a mini-spec for a small task. This keeps the AI laser focused (\u201cimplement this specific function according to this spec snippet\u201d), and you can iterate in a tight loop. It\u2019s similar to giving the AI a checklist item to complete rather than the whole checklist at once.<\/p>\n<p>The bottom line: Small, focused context beats one giant prompt. This improves quality and keeps the AI from getting \u201coverwhelmed\u201d by too much at once. As one set of best practices sums up, provide \u201cOne Task Focus\u201d and \u201cRelevant info only\u201d to the model, and avoid dumping everything everywhere. By structuring the work into modules\u2014and using strategies like spec summaries or subspec agents\u2014you\u2019ll navigate around context size limits and the AI\u2019s short-term memory cap. Remember, a well-fed AI is like a well-fed function: Give it only the <a href=\"https:\/\/addyo.substack.com\/p\/context-engineering-bringing-engineering\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">inputs it needs for the job at hand<\/a>.<\/p>\n<p>4. Build in Self-Checks, Constraints, and Human Expertise<\/p>\n<p>Make your spec not just a to-do list for the agent but also a guide for quality control\u2014and don\u2019t be afraid to inject your own expertise.<\/p>\n<p>A good spec for an AI agent anticipates where the AI might go wrong and sets up guardrails. It also takes advantage of what you know (domain knowledge, edge cases, \u201cgotchas\u201d) so the AI doesn\u2019t operate in a vacuum. Think of the spec as both coach and referee for the AI: It should encourage the right approach and call out fouls.<\/p>\n<p>Use three-tier boundaries: <a href=\"https:\/\/github.blog\/ai-and-ml\/github-copilot\/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub\u2019s analysis of 2,500+ agent files<\/a> found that the most effective specs use a three-tier boundary system rather than a simple list of don\u2019ts. This gives the agent clearer guidance on when to proceed, when to pause, and when to stop:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"282\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2026\/02\/Agent-proceed-pause-or-stop.jpg\" alt=\"Agent boundaries\" class=\"wp-image-18100\"  \/><\/p>\n<p>\u2705 Always do: Actions the agent should take without asking. \u201cAlways run tests before commits.\u201d \u201cAlways follow the naming conventions in the style guide.\u201d \u201cAlways log errors to the monitoring service.\u201d<\/p>\n<p>\u26a0\ufe0f Ask first: Actions that require human approval. \u201cAsk before modifying database schemas.\u201d \u201cAsk before adding new dependencies.\u201d \u201cAsk before changing CI\/CD configuration.\u201d This tier catches high-impact changes that might be fine but warrant a human check.<\/p>\n<p>\ud83d\udeab Never do: Hard stops. \u201cNever commit secrets or API keys.\u201d \u201cNever edit node_modules\/ or vendor\/.\u201d \u201cNever remove a failing test without explicit approval.\u201d \u201cNever commit secrets\u201d was the single most common helpful constraint in the study.<\/p>\n<p>This three-tier approach is more nuanced than a flat list of rules. It acknowledges that some actions are always safe, some need oversight, and some are categorically off-limits. The agent can proceed confidently on \u201cAlways\u201d items, flag \u201cAsk first\u201d items for review, and hard-stop on \u201cNever\u201d items.<\/p>\n<p>Encourage self-verification: One powerful pattern is to have the agent verify its work against the spec automatically. If your tooling allows, you can integrate checks like unit tests or linting that the AI can run after generating code. But even at the spec\/prompt level, you can instruct the AI to double-check (e.g., \u201cAfter implementing, compare the result with the spec and confirm all requirements are met. List any spec items that are not addressed.\u201d). This pushes the LLM to reflect on its output relative to the spec, catching omissions. It\u2019s a form of self-audit built into the process.<\/p>\n<p>For instance, you might append to a prompt: \u201c(After writing the function, review the above requirements list and ensure each is satisfied, marking any missing ones).\u201d The model will then (ideally) output the code followed by a short checklist indicating if it met each requirement. This reduces the chance it forgets something before you even run tests. It\u2019s not foolproof, but it helps.<\/p>\n<p>LLM-as-a-Judge for subjective checks: For criteria that are hard to test automatically\u2014code style, readability, adherence to architectural patterns\u2014consider using \u201cLLM-as-a-Judge.\u201d This means having a second agent (or a separate prompt) review the first agent\u2019s output against your spec\u2019s quality guidelines. Anthropic and others have found this effective for subjective evaluation. You might prompt \u201cReview this code for adherence to our style guide. Flag any violations.\u201d The judge agent returns feedback that either gets incorporated or triggers a revision. This adds a layer of semantic evaluation beyond syntax checks.<\/p>\n<p>Conformance testing: Willison advocates building conformance suites\u2014language-independent tests (often YAML based) that any implementation must pass. These act as a contract: If you\u2019re building an API, the conformance suite specifies expected inputs\/outputs, and the agent\u2019s code must satisfy all cases. This is more rigorous than ad hoc unit tests because it\u2019s derived directly from the spec and can be reused across implementations. Include conformance criteria in your spec\u2019s success section (e.g., \u201cMust pass all cases in conformance\/api-tests.yaml\u201d).<\/p>\n<p>Leverage testing in the spec: If possible, incorporate a test plan or even actual tests in your spec and prompt flow. In traditional development, we use TDD or write test cases to clarify requirements\u2014you can do the same with AI. For example, in the spec\u2019s success criteria, you might say, \u201cThese sample inputs should produce these outputs\u2026\u201d or \u201cThe following unit tests should pass.\u201d The agent can be prompted to run through those cases in its head or actually execute them if it has that capability. Willison noted that having a <a href=\"https:\/\/simonwillison.net\/2025\/Oct\/7\/vibe-engineering\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">robust test suite<\/a> is like giving the agents superpowers: They can validate and iterate quickly when tests fail. In an AI coding context, writing a bit of pseudocode for tests or expected outcomes in the spec can guide the agent\u2019s implementation. Additionally, you can use a dedicated \u201c<a href=\"https:\/\/10xdevelopers.dev\/structured\/claude-code-with-subagents\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">test agent<\/a>\u201d in a subagent setup that takes the spec\u2019s criteria and continuously verifies the \u201ccode agent\u2019s\u201d output.<\/p>\n<p>Bring your domain knowledge: Your spec should reflect insights that only an experienced developer or someone with context would know. For example, if you\u2019re building an ecommerce agent and you know that \u201cproducts\u201d and \u201ccategories\u201d have a many-to-many relationship, state that clearly. (Don\u2019t assume the AI will infer it\u2014it might not.) If a certain library is notoriously tricky, mention pitfalls to avoid. Essentially, pour your mentorship into the spec. The spec can contain advice like \u201cIf using library X, watch out for memory leak issue in version Y (apply workaround Z).\u201d This level of detail is what turns an average AI output into a truly robust solution, because you\u2019ve steered the AI away from common traps.<\/p>\n<p>Also, if you have preferences or style guidelines (say, \u201cuse functional components over class components in React\u201d), encode that in the spec. The AI will then emulate your style. Many engineers even include small examples in the spec (for instance, \u201cAll API responses should be JSON, e.g., {\u201cerror\u201d: \u201cmessage\u201d} for errors.\u201d). By giving a quick example, you anchor the AI to the exact format you want.<\/p>\n<p>Minimalism for simple tasks: While we advocate thorough specs, part of expertise is knowing when to keep it simple. For relatively simple, isolated tasks, an overbearing spec can actually confuse more than help. If you\u2019re asking the agent to do something straightforward (like \u201ccenter a div on the page\u201d), you might just say, \u201cMake sure to keep the solution concise and do not add extraneous markup or styles.\u201d No need for a full PRD there. Conversely, for complex tasks (like \u201cimplement an OAuth flow with token refresh and error handling\u201d), that\u2019s when you break out the detailed spec. A good rule of thumb: Adjust spec detail to task complexity. Don\u2019t underspec a hard problem (the agent will flail or go off-track), but don\u2019t overspec a trivial one (the agent might get tangled or use up context on unnecessary instructions).<\/p>\n<p>Maintain the AI\u2019s \u201cpersona\u201d if needed: Sometimes, part of your spec is defining how the agent should behave or respond, especially if the agent interacts with users. For example, if building a customer support agent, your spec might include guidelines like \u201cUse a friendly and professional tone\u201d and \u201cIf you don\u2019t know the answer, ask for clarification or offer to follow up rather than guessing.\u201d These kinds of rules (often included in system prompts) help keep the AI\u2019s outputs aligned with expectations. They are essentially spec items for AI behavior. Keep them consistent and remind the model of them if needed in long sessions. (LLMs can \u201cdrift\u201d in style over time if not kept on a leash.)<\/p>\n<p>You remain the exec in the loop: The spec empowers the agent, but you remain the ultimate quality filter. If the agent produces something that technically meets the spec but doesn\u2019t feel right, trust your judgement. Either refine the spec or directly adjust the output. The great thing about AI agents is they don\u2019t get offended\u2014if they deliver a design that\u2019s off, you can say, \u201cActually, that\u2019s not what I intended, let\u2019s clarify the spec and redo it.\u201d The spec is a living artifact in collaboration with the AI, not a one-time contract you can\u2019t change.<\/p>\n<p>Simon Willison humorously likened working with AI agents to \u201ca very weird form of management\u201d and even \u201cgetting good results out of a coding agent feels <a href=\"https:\/\/simonwillison.net\/2025\/Oct\/7\/vibe-engineering\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">uncomfortably close to managing a human intern<\/a>.\u201d You need to provide clear instructions (the spec), ensure they have the necessary context (the spec and relevant data), and give actionable feedback. The spec sets the stage, but monitoring and feedback during execution are key. If an AI was a \u201cweird digital intern who will absolutely cheat if you give them a chance,\u201d the spec and constraints you write are how you prevent that cheating and keep them on task.<\/p>\n<p>Here\u2019s the payoff: A good spec doesn\u2019t just tell the AI what to build; it also helps it self-correct and stay within safe boundaries. By baking in verification steps, constraints, and your hard-earned knowledge, you drastically increase the odds that the agent\u2019s output is correct on the first try (or at least much closer to correct). This reduces iterations and those \u201cWhy on Earth did it do that?\u201d moments.<\/p>\n<p>5. Test, Iterate, and Evolve the Spec (and Use the Right Tools)<\/p>\n<p>Think of spec writing and agent building as an iterative loop: test early, gather feedback, refine the spec, and leverage tools to automate checks.<\/p>\n<p>The initial spec is not the end\u2014it\u2019s the beginning of a cycle. The best outcomes come when you continually verify the agent\u2019s work against the spec and adjust accordingly. Also, modern AI devs use various tools to support this process (from CI pipelines to context management utilities).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"459\" src=\"https:\/\/www.newsbeep.com\/ca\/wp-content\/uploads\/2026\/02\/Initial-spec.jpg\" alt=\"Initial spec\" class=\"wp-image-18101\"  \/><\/p>\n<p>Continuous testing: Don\u2019t wait until the end to see if the agent met the spec. After each major milestone or even each function, run tests or at least do quick manual checks. If something fails, update the spec or prompt before proceeding. For example, if the spec said, \u201cPasswords must be hashed with bcrypt\u201d and you see the agent\u2019s code storing plain text, stop and correct it (and remind the spec or prompt about the rule). Automated tests shine here: If you provided tests (or write them as you go), let the agent run them. In many coding agent setups, you can have an agent run npm test or similar after finishing a task. The results (failures) can then feed back into the next prompt, effectively telling the agent \u201cYour output didn\u2019t meet spec on X, Y, Z\u2014fix it.\u201d This kind of agentic loop (code &gt; test &gt; fix &gt; repeat) is extremely powerful and is how tools like Claude Code or Copilot Labs are evolving to handle larger tasks. Always define what \u201cdone\u201d means (via tests or criteria) and check for it.<\/p>\n<p>Iterate on the spec itself: If you discover that the spec was incomplete or unclear (maybe the agent misunderstood something or you realized you missed a requirement), update the spec document. Then explicitly resync the agent with the new spec: \u201cI have updated the spec as follows\u2026 Given the updated spec, adjust the plan or refactor the code accordingly.\u201d This way the spec remains the single source of truth. It\u2019s similar to how we handle changing requirements in normal dev, but in this case you\u2019re also the product manager for your AI agent. Keep version history if possible (even just via commit messages or notes), so you know what changed and why.<\/p>\n<p>Utilize context management and memory tools: There\u2019s a growing ecosystem of tools to help manage AI agent context and knowledge. For instance, retrieval-augmented generation (RAG) is a pattern where the agent can pull in relevant chunks of data from a knowledge base (like a vector database) on the fly. If your spec is huge, you could embed sections of it and let the agent retrieve the most relevant parts when needed, instead of always providing the whole thing. There are also frameworks implementing the Model Context Protocol (MCP), which automates feeding the right context to the model based on the current task. One example is <a href=\"https:\/\/docs.digitalocean.com\/products\/gradient-ai-platform\/concepts\/context-management\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Context7<\/a> (context7.com), which can auto-fetch relevant context snippets from docs based on what you\u2019re working on. In practice, this might mean the agent notices you\u2019re working on \u201cpayment processing\u201d and it pulls the payments section of your spec or documentation into the prompt. Consider leveraging such tools or setting up a rudimentary version (even a simple search in your spec document).<\/p>\n<p>Parallelize carefully: Some developers run multiple agent instances in parallel on different tasks (as mentioned earlier with subagents). This can speed up development (e.g., one agent generates code while another simultaneously writes tests, or two features are built concurrently). If you go this route, ensure the tasks are truly independent or clearly separated to avoid conflicts. (The spec should note any dependencies.) For example, don\u2019t have two agents writing to the same file at once. One workflow is to have an agent generate code and another review it in parallel, or to have separate components built that integrate later. This is advanced usage and can be mentally taxing to manage. (As Willison admitted, running multiple agents is <a href=\"https:\/\/simonwillison.net\/2025\/Oct\/7\/vibe-engineering\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">surprisingly effective, if mentally exhausting<\/a>!) Start with at most 2\u20133 agents to keep things manageable.<\/p>\n<p>Version control and spec locks: Use Git or your version control of choice to track what the agent does. <a href=\"https:\/\/simonwillison.net\/2025\/Oct\/7\/vibe-engineering\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Good version control habits<\/a> matter even more with AI assistance. Commit the spec file itself to the repo. This not only preserves history, but the agent can even use git diff or blame to understand changes. (LLMs are quite capable of reading diffs.) Some advanced agent setups let the agent query the VCS history to see when something was introduced\u2014surprisingly, models can be \u201cfiercely competent at Git.\u201d By keeping your spec in the repo, you allow both you and the AI to track evolution. There are tools (like GitHub Spec Kit mentioned earlier) that integrate spec-driven development into the Git workflow\u2014for instance, gating merges on updated specs or generating checklists from spec items. While you don\u2019t need those tools to succeed, the takeaway is to treat the spec like code: Maintain it diligently.<\/p>\n<p>Cost and speed considerations: Working with large models and long contexts can be slow and expensive. A practical tip is to use model selection and batching smartly. Perhaps use a cheaper\/faster model for initial drafts or repetitions, and reserve the most capable (and expensive) model for final outputs or complex reasoning. Some developers use GPT-4 or Claude for planning and critical steps, but offload simpler expansions or refactors to a local model or a smaller API model. If using multiple agents, maybe not all need to be top tier; a test-running agent or a linter agent could be a smaller model. Also consider throttling context size: Don\u2019t feed 20K tokens if 5K will do. As we discussed, <a href=\"https:\/\/www.anthropic.com\/engineering\/effective-context-engineering-for-ai-agents\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">more tokens can mean diminishing returns<\/a>.<\/p>\n<p>Monitor and log everything: In complex agent workflows, logging the agent\u2019s actions and outputs is essential. Check the logs to see if the agent is deviating or encountering errors. Many frameworks provide trace logs or allow printing the agent\u2019s chain of thought (especially if you prompt it to think step-by-step). Reviewing these logs can highlight where the spec or instructions might have been misinterpreted. It\u2019s not unlike debugging a program\u2014except the \u201cprogram\u201d is the conversation\/prompt chain. If something weird happens, go back to the spec\/instructions to see if there was ambiguity.<\/p>\n<p>Learn and improve: Finally, treat each project as a learning opportunity to refine your spec-writing skill. Maybe you\u2019ll discover that a certain phrasing consistently confuses the AI, or that organizing spec sections in a certain way yields better adherence. Incorporate those lessons into the next spec. The field of AI agents is rapidly evolving, so new best practices (and tools) emerge constantly. Stay updated via blogs (like the ones by Simon Willison, Andrej Karpathy, etc.), and don\u2019t hesitate to experiment.<\/p>\n<p>A spec for an AI agent isn\u2019t \u201cwrite once, done.\u201d It\u2019s part of a continuous cycle of instructing, verifying, and refining. The payoff for this diligence is substantial: By catching issues early and keeping the agent aligned, you avoid costly rewrites or failures later. As one AI engineer quipped, using these practices can feel like having \u201can army of interns\u201d working for you, but you have to manage them well. A good spec, continuously maintained, is your management tool.<\/p>\n<p>Avoid Common Pitfalls<\/p>\n<p>Before wrapping up, it\u2019s worth calling out antipatterns that can derail even well-intentioned spec-driven workflows. The <a href=\"https:\/\/github.blog\/ai-and-ml\/github-copilot\/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub study of 2,500+ agent files<\/a> revealed a stark divide: \u201cMost agent files fail because they\u2019re too vague.\u201d Here are the mistakes to avoid:<\/p>\n<p>Vague prompts: \u201cBuild me something cool\u201d or \u201cMake it work better\u201d gives the agent nothing to anchor on. As Baptiste Studer puts it: \u201cVague prompts mean wrong results.\u201d Be specific about inputs, outputs, and constraints. \u201cYou are a helpful coding assistant\u201d doesn\u2019t work. \u201cYou are a test engineer who writes tests for React components, follows these examples, and never modifies source code\u201d does.<\/p>\n<p>Overlong contexts without summarization: Dumping 50 pages of documentation into a prompt and hoping the model figures it out rarely works. Use hierarchical summaries (as discussed in principle 3) or RAG to surface only what\u2019s relevant. Context length is not a substitute for context quality.<\/p>\n<p>Skipping human review: Willison has a personal rule\u2014\u201cI won\u2019t commit code I couldn\u2019t explain to someone else.\u201d Just because the agent produced something that passes tests doesn\u2019t mean it\u2019s correct, secure, or maintainable. Always review critical code paths. The \u201chouse of cards\u201d metaphor applies: AI-generated code can look solid but collapse under edge cases you didn\u2019t test.<\/p>\n<p>Conflating vibe coding with production engineering: Rapid prototyping with AI (\u201cvibe coding\u201d) is great for exploration and throwaway projects. But shipping that code to production without rigorous specs, tests, and review is asking for trouble. I distinguish \u201cvibe coding\u201d from \u201cAI-assisted engineering\u201d\u2014the latter requires the discipline this guide describes. Know which mode you\u2019re in.<\/p>\n<p>Ignoring the \u201clethal trifecta\u201d: Willison warns of three properties that make AI agents dangerous: speed (they work faster than you can review), nondeterminism (same input, different outputs), and cost (encouraging corner cutting on verification). Your spec and review process must account for all three. Don\u2019t let speed outpace your ability to verify.<\/p>\n<p>Missing the six core areas: If your spec doesn\u2019t cover commands, testing, project structure, code style, git workflow, and boundaries, you\u2019re likely missing something the agent needs. Use the six-area checklist from section 2 as a sanity check before handing off to the agent.<\/p>\n<p>Conclusion<\/p>\n<p>Writing an effective spec for AI coding agents requires solid software engineering principles combined with adaptation to LLM quirks. Start with clarity of purpose and let the AI help expand the plan. Structure the spec like a serious design document, covering the six core areas and integrating it into your toolchain so it becomes an executable artifact, not just prose. Keep the agent\u2019s focus tight by feeding it one piece of the puzzle at a time (and consider clever tactics like summary TOCs, subagents, or parallel orchestration to handle big specs). Anticipate pitfalls by including three-tier boundaries (always\/ask first\/never), self-checks, and conformance tests\u2014essentially, teach the AI how to not fail. And treat the whole process as iterative: use tests and feedback to refine both the spec and the code continuously.<\/p>\n<p>Follow these guidelines and your AI agent will be far less likely to \u201cbreak down\u201d under large contexts or wander off into nonsense.<\/p>\n<p>Happy spec-writing!<\/p>\n<p>On March 26, join Addy and Tim O\u2019Reilly at AI Codecon: Software Craftsmanship in the Age of AI, where an all-star lineup of experts will go deeper into orchestration, agent coordination, and the new skills developers need to build excellent software that creates value for all participants. <a href=\"https:\/\/www.oreilly.com\/AI-Codecon\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Sign up for free here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"This post first appeared on Addy Osmani\u2019s Elevate Substack newsletter and is being republished here with the author\u2019s&hellip;\n","protected":false},"author":2,"featured_media":493852,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[62,276,277,49,48,61],"class_list":{"0":"post-493851","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ca","12":"tag-canada","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/493851","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/comments?post=493851"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/493851\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media\/493852"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media?parent=493851"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/categories?post=493851"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/tags?post=493851"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}