Skip to content

23 · Spec as Architecture: Constraints for AI

The thesis in one line: Ch. 17 says "feed your judgment to the AI." But if that judgment only lives in your head — or in a single chat message — the AI forgets it the next session, or fills in defaults on your behalf. Turning that judgment into a 'spec' the AI reads every time, and that machines can enforce, is what real guardrails look like. In the AI era, writing constraints clearly is itself the architecture work.


🤝 AI-collaborative design track, ch. 1 · What this track is about

Prerequisites: the practitioner track (18–22). You already know how to design, evolve, and migrate AI systems. The AI-collaborative track shifts the angle: not "how you build" but "how you build with AI without losing control." It skips vibe-coding tool tips and focuses on two things — how to write constraints for AI (this chapter + 25) and how to review AI output (24) — closing with (26) a decision tree. Same product line as the architecture-copilot skill.


1. The problem: judgment not written down is judgment not there

The conclusion of Ch. 17 is: AI can produce the code, but the spec, constraints, and acceptance criteria are yours. The question is — how do you "give" them?

Beginners think "giving" means saying "remember to make it idempotent" in the chat. That does not work:

   What you think "feeding judgment" means        What actually happens
   ────────────────────────────────────           ────────────────────────────────────────
   say "refunds must be idempotent" in chat  →   this session remembers; the next session
                                                  forgets after context compaction;
                                                  a new session starts from zero;
                                                  a teammate has no idea at all

And there is the deeper problem from Ch. 17: AI defaults to the happy path — if you do not explicitly say "refunds must be idempotent, guard against double-charging," it simply will not write that, and hands you code that demos fine but explodes in production.

The core proposition: architectural constraints must be made persistent, automatically read by AI every time, and ideally machine-enforced. Verbal, one-off, memory-dependent constraints are non-existent in AI collaboration. This is the AI-era upgrade of Ch. 08's "the first thing lost is always the why" — except now the "reader" has shifted from a teammate three months later to an AI that starts fresh every conversation.


2. The spec pyramid: from "human-readable" to "machine-enforced"

Arrange constraints by enforceability into three layers — the lower you go, the harder and more inescapable:

                    ▲ higher up: answers "why," relies on human understanding
        ┌───────────────────────────────────┐
        │  ADR (Architecture Decision       │  Human-read — "why we decided this, what we gave up"
        │  Records) · Ch. 08               │     AI reads it → knows boundaries & history, does not stray
        ├───────────────────────────────────┤
        │  AGENTS.md / CLAUDE.md            │  AI resident-read — "what to do / what not to do"
        │  (project-level rule book)        │     auto-loaded every conversation; the AI's "long-term memory"
        ├───────────────────────────────────┤
        │  Fitness functions / lint / CI    │  Machine-enforced — "violate and it fails; no way around it"
        │  Ch. 14                           │     no relying on conscience; the red light does the job
        └───────────────────────────────────┘
                    ▼ lower down: answers "enforce," machines verify automatically

All three layers are needed — none is optional:

LayerFormGovernsReader
ADROne-page doc (Ch. 08)Why the decision was made, trade-offsHumans + AI (understand intent)
AGENTS.md / CLAUDE.mdRule file in the project rootWhat to do / not to do (standing rules)AI (every conversation)
Fitness functions / CIAutomated tests (Ch. 14)Hard lines (violation is blocked)Machine

Why three layers instead of one? Because constraints come in three kinds: some can only be grasped by humans ("why we chose dual-write back then" — goes into the ADR); some can be expressed in natural language for AI to follow continuously ("refunds must be idempotent" — goes into AGENTS.md); some can be verified exactly by machine ("domain layer must not import the web layer" — becomes a CI check). If a constraint can sink one layer lower, do not leave it at the layer above — docs rely on conscience; CI relies on the red light.


3. How to write a good AGENTS.md (the AI resident rule book)

AGENTS.md (or CLAUDE.md) is the star of this layer — a file placed at the project root that AI reads automatically every conversation (Claude Code / Codex memory files do exactly this). But 90% of people write them like boilerplate that says nothing.

Bad example (might as well not write it):

- Write high-quality, maintainable, and extensible code
- Follow best practices
- Mind performance and security

AI can do nothing with these — "high quality" means what exactly? "Best practices" according to whom? These are correct-sounding platitudes.

Good example (specific + actionable + with "why" + with counterexample):

## Data & consistency
- All money-touching operations (charges / refunds) must carry an idempotency key;
  repeat requests return the previous result and are never re-executed.
  Rationale: see ADR-003.
- ❌ The model must not call the payment API directly; it may only produce a "refund
  proposal" — the deterministic refund service executes it (see Ch. 19 design).

## Resilience
- All external calls (model API / third parties) must have a timeout + retry
  with exponential back-off. Rationale: dependencies will flake (see Ch. 12).

## Boundaries
- The domain layer must not import the web layer / framework. Violations fail CI
  (see fitness test).

Five disciplines for writing AGENTS.md:

  1. Specific enough to act on: do not say "mind security" — say "all external input must be validated before storage; treat user content as untrusted (including prompt injection)."
  2. Include the "why": give a rationale like an ADR — AI (and humans) who know the reason will not quietly delete a rule that "seems redundant" (echoing Ch. 08's opening story of "deleting dual-write, outage for three weeks").
  3. Give counterexamples (❌): stating "what is forbidden" is more effective than "what should be done" — this is precisely where you block the AI happy-path default.
  4. Short and sharp: the AI context window is limited and billed by the token (Ch. 17's context engineering). A rule book is a standing cost — do not write a novel; write only the most frequent, most easily violated rules.
  5. Evolve with the code: a stale rule is worse than no rule (it actively misleads AI). When rules change, update them — same discipline as appending / updating ADRs in Ch. 08.

AGENTS.md is "the architectural boundary written for AI." Every "chose A, gave up B" decision you made in Ch. 19 and every ADR you wrote in Ch. 20 — anything you want AI to keep obeying — should be distilled into a line in AGENTS.md.


4. If a machine can enforce it, do not just write docs

The hardest layer is at the bottom of the pyramid: if a constraint can be expressed as a test or lint rule, do not leave it only in a document — docs rely on AI and human conscience; CI is a red light you cannot get around. This is exactly the core value of Ch. 14's fitness functions in AI collaboration:

   Route each constraint by "can a machine verify it?":

   Machine can verify precisely  ──▶  fitness function / lint / contract test / CI gate (enforced)
   Language can constrain AI     ──▶  AGENTS.md (standing rule)
   Only humans can judge         ──▶  ADR + manual review checklist (next chapter, 24)
ConstraintWhat to turn it into (sink toward "enforced" as far as possible)
Domain layer must not depend on the frameworkDependency check (CI fail)
Refunds must be idempotentAGENTS.md rule + contract test that fails when idempotency key is missing
Model calls must go through an abstraction layerDependency check (direct provider connection → fail, see Ch. 21)
p99 < 200 msPerformance test (exceed threshold → fail)
"Why RAG instead of fine-tuning"ADR (human-read only; cannot be machine-verified)

Architectural wisdom: AI will faithfully execute every constraint you write down — and will faithfully ignore every constraint you do not write down. Turn "rely on reminders" constraints into "rely on red lights" constraints — every reminder you skip is one more way the system quietly rots. CI gates do not care whether the code was committed by a human or AI; they block either one equally.


5. Connecting to architecture-copilot: letting the spec "grow itself"

Writing ADRs, AGENTS.md, and fitness functions can sound like extra overhead. But it is really just putting on paper the judgments you already needed to think through (Ch. 07 step ⑧, Ch. 08's ADRs) — and that work itself can be done with AI's help.

The architecture-copilot skill does exactly that: it turns this tutorial's knowledge into an interactive skill inside Claude Code / Cursor / Codex that guides you step by step to produce architectural judgments and specs — asking the right questions, drafting the ADR, generating the first version of AGENTS.md. You supply the judgment; it structures that judgment into a spec AI can obey.

Strong opinion: spec is architecture. In the AI era, "writing constraints clearly and in a place AI can read them" is not documentation busywork — it is core architecture work itself. Because: the constraints you write = the boundary of AI output. Vague constraints mean AI decides in the vague spots on your behalf (and always picks the laziest, most optimistic path). The architect's judgment, passed through the 'spec' interface, becomes behavioral constraints on AI.


🎯 Quick check

🤔You want AI to obey the rule 'money-touching operations must be idempotent' every time it writes code. What is the most effective approach?
  • ARemind it verbally in the chat at the start of each session
  • BWrite it into the project root AGENTS.md / CLAUDE.md (AI reads it every time) and back it up with a CI check that fails when the idempotency key is missing
  • CTrust the model is smart enough to add idempotency on its own
🤔Which of the following AGENTS.md rules is actually useful?
  • AWrite high-quality, maintainable code that follows best practices
  • BAll external calls must have a timeout + exponential back-off retry (rationale: dependencies will flake, see Ch. 12); ❌ bare calls without a timeout are forbidden
  • CTry to keep performance and security in mind

Chapter summary

  • Judgment not written down is judgment not there: verbal constraints in chat will be lost; AI defaults to the happy path and will not write what you did not explicitly say. This is the AI-era upgrade of Ch. 08's "why gets lost first."
  • The spec pyramid: ADR (human-readable why, Ch. 08) → AGENTS.md / CLAUDE.md (AI resident-read rules) → fitness functions / CI (Ch. 14, machine-enforced). If it can sink a layer, do not leave it at the layer above.
  • Write good AGENTS.md: specific and actionable, with "why," with counterexamples (❌), short and sharp, evolves with the code. It is the architectural boundary written for AI.
  • If a machine can enforce it, do not just write docs: docs rely on conscience; CI relies on the red light — and CI does not care whether the code was written by a human or AI.
  • Spec is architecture: writing constraints clearly and where AI can read them is core architecture work — and that is exactly what architecture-copilot is there to help you do.

Bridging forward: specs handle giving constraints to AI upfront. But AI will still produce output that violates constraints, or violates constraints you never thought to write — so you also need reviewing output line by line, after the fact. Next: 24 · Review checklist: what AI output omits by default — a production-grade review checklist organized around the advanced track: what the AI "happy path" leaves out by default, and what to ask to fill the gaps.


💬 Comments