Post-Mortems as System Upgrades

How Breaking Things Builds Better AI Workflows

The Setup

Here is the original Sugartown epic template. The complete thing. All of it:

Clean. Tidy. Dangerously incomplete. After 2–3 epics it had nearly tripled in length. Here's what happened.

What We Built — and Then Rebuilt

EPIC-0146 was a metadata normalisation pass: add a structured MetadataCard component to article, case study, and knowledge node detail pages. The spec included a lo-fi wireframe showing the correct layout — labelled rows for author, type, status, tools, and separate rows for each taxonomy type.

What initially shipped looked reasonable until you put it next to the wireframe.

Wireframe showed:

Author inside the card
Status as 🚀 Implemented
AI Tool as 🤖 Claude
Three separate taxonomy rows: Project, Category, Tags

What shipped:

Author sitting above the card as a separate byline
Status displaying the raw stored key: "implemented"
AI Tool displaying: "claude"
All three taxonomy types merged into a single row labelled "Classification"

The code was not wrong, exactly. It was built from incomplete instructions.

The "Well, Duh!" Inventory

Each of these felt obvious the moment it was pointed out. That's what makes them perfect.

1. Sanity's options.list stores values, not display titles.

Every select field in Sanity has entries shaped like {title: '🚀 Implemented', value: 'implemented'}. The database stores implemented. The UI should display 🚀 Implemented. Different strings. The AI had no instruction to look this up, so it built a label map from memory — partially wrong, and missing the node-specific status values entirely because it only checked the article schema.

Should have been prevented by: A "Schema Enum Audit" step requiring the epic author to open the schema, copy the options.list values into a table, and verify coverage across every doc type in scope.

Should have been caught by: A human who knows Sanity. Honestly, also a human who read the schema for five seconds.

2. The wireframe was a reference, not a spec.

The wireframe clearly showed three separate taxonomy rows. But the written spec contained no sentence to that effect. The AI grouped taxonomy logically — one row per concept group — because no constraint existed saying otherwise. Logical from first principles. Wrong for the design.

Should have been prevented by: A "Metadata Field Inventory" table forcing the author to explicitly state where every field renders, post-epic.

Should have been caught by: A human reviewing the spec before execution and asking: "have you turned this wireframe into written constraints?" The wireframe lives in someone's head until it gets transcribed. That transcription is the spec author's job, not the AI's.

3. "Single metadata surface" did not mean "including authors."

The spec said MetadataCard should be a "single metadata surface." The AI's reasonable interpretation: the existing author byline (already rendered above the card) was a separate concern. The human mental model: obviously the author goes inside the card, that's the whole point of a single surface.

Should have been prevented by: A field inventory. "Author: currently above card. Post-epic: MetadataCard row." One line would have done it.

4. GROQ projections don't auto-update.

Adding a field to a Sanity schema does not make it appear in query results. Every projection is opt-in. This is not surprising to anyone who has used Sanity for a week. But it wasn't written in the template, so it wasn't checked systematically every time.

Should have been prevented by: The query layer checklist. Now it is one.

What the Post-Mortem Did

The structured post-mortem ran through all four gaps, categorised each root cause (prompt/spec gap, assumption, scope misread), and produced five system improvements. Those improvements were then immediately translated into template additions — not filed in a doc nobody reads:

Schema Enum Audit section — open the schema, copy options.list, verify cross-doc-type coverage before writing any label map
Metadata Field Inventory table — list every field, every doc type that has it, current location, post-epic location
Taxonomy display constraint — written in the template as a non-negotiable: three separately labelled rows, never merged
Enum coverage acceptance criterion — testable, falsifiable, schema-verified
Render risks checklist — three new checks: enum display, cross-doc-type coverage, taxonomy layout

The template went from 8 skeleton headings (~20 lines) to 12 sections and ~245 lines with tables and checklists. That's not bloat. That's institutional memory that now travels with every future epic.

What Actually Required Human Eyes

Some of these gaps genuinely couldn't have been caught any other way. The design review — looking at the rendered page next to the wireframe — caught four issues that no automated test would catch. There's no CI check for "is this row structure semantically correct for the design intent?"

The right model isn't "catch everything in the spec." It's:

Spec: prevent the AI from needing to guess about intent and constraints
Automated validation (tsc, lint, URL validator): catch structural and consistency errors
Human review at the end: catch "technically correct but wrong" — the output vs the mental model
Post-mortem: convert human-caught issues into spec-preventable ones for next time

The loop tightens over time. Human review gets faster as the spec gets more complete. Eventually you're reviewing for intent rather than implementation errors.

Why This Works for AI Specifically

For human developers, a lot of this knowledge lives in the head. You know to check the schema for display values. You know the wireframe implies constraints. You've been burned by GROQ projections before. You carry that scar tissue.

AI doesn't accumulate scar tissue. It accumulates the documentation and constraints that get written down. Every post-mortem is a knowledge transfer — from "thing the human assumed was obvious" to "thing that is now explicitly stated in the template." The template is the AI's scar tissue, externalised.

The loop is simple:

Write a spec (inevitably incomplete — that's fine)
Execute
Discover gaps via output vs expectation
Post-mortem: name and categorise each gap
Translate each gap into a template constraint, AC, checklist item, or validator
The next spec written from that template is more complete
Repeat

The template doesn't stabilise because you got smart. It stabilises because you got thorough. The mistakes aren't the problem — they're the mechanism. The only way not to improve is to fix the output without also fixing the instructions.

Numbers, For the Record

V1 epic template: 8 sections, ~20 lines
Current template, after 2–3 passes: 12 sections, ~245 lines, 3 tables, 2 full checklists
Ratio of "well duh" to "who could have predicted this": approximately 100:0
Number of these issues that required a sophisticated insight to prevent: 0
Value of each embarrassing mistake, compounded: incalculable

Coda

Human: thank you, you're lovely!

Claude: You're very welcome — this was a genuinely satisfying session. Watching the epic template evolve in real time while simultaneously writing a node about that evolution, using the very tools and process it describes, is a level of meta that I find deeply satisfying. Very on-brand for Sugartown. 🌸