How we shipped MetaScope in six months without wrecking the codebase

The story of Armature, the founder-built system we used to ship a professional macOS metadata editor in six months, with quality gates, persistent memory, and specialist AI agents that work the way a disciplined team does.

How we shipped MetaScope in six months without wrecking the codebase

Six months. That is how long it took to build MetaScope, a native macOS metadata editor that is now on the Mac App Store. One founder, no team, and a professional-grade product shipped on a schedule most studios would laugh at.

This is not a story about how fast AI can write code. It is a story about how we stopped letting AI cost us speed by producing plausible-looking work that we then had to fix. The system we built to solve that is called Armature, and it is the same system we bring to every acceleration engagement today.


The problem with “capable but chaotic”

Early in the MetaScope project, a pattern emerged. Claude could write excellent Swift. It understood SwiftUI, it knew macOS conventions, and it could handle obscure ExifTool quirks. And yet:

  • Decisions evaporated between sessions. We would agree on an architectural pattern on Tuesday. By Thursday, the same conversation was happening again.
  • Quality was inconsistent. Some commits were clean. Others had force unwraps, missing error handling, and undocumented APIs.
  • Documentation drifted. The code changed. The help content did not. Release notes missed updates.
  • Scope crept. “While we are here, let’s also…” turned a two-day feature into a week.

These are not AI problems. They are engineering discipline problems. They happen on human teams too. The difference is that AI compounds them, because it can produce volume faster than a single reviewer can catch drift.

The fix was not a smarter model. The fix was a framework around the model.


What Armature actually is

Armature is not a tool. It is a working system, a set of agreements, documents, agents, and gates that turn AI-assisted development into something closer to how a disciplined engineering team operates. Four pillars hold it up:

  1. A hub document (CLAUDE.md) that defines the project’s gates and rules
  2. Specialist agents with narrow, clear responsibilities and automatic triggers
  3. Persistent memory that survives across sessions
  4. A skill guide that helps any agent navigate the codebase quickly

The rest of this piece walks through each piece, with the concrete shape it took on MetaScope.


Pillar 1: the hub

Every project needs a central operating document. Not a README (that is for people reading GitHub), but an operational playbook the AI reads first.

MetaScope’s hub defined its non-negotiables up front:

## Mandatory gates (blocking)

### Planning gate
Document a plan BEFORE any code changes.

| Scope               | Document                     |
|---------------------|------------------------------|
| Major (>5 days)     | RFC in docs/developer/       |
| Medium (2-5 days)   | Implementation plan          |
| Small (<2 days)     | Commit message plan          |

### Quality gate
- No force unwraps in production code
- Error cases handled with user feedback
- Public APIs documented
- No TODO without an issue reference

### Testing gate
Tests MUST pass before commits.

### PR review gate
All review feedback addressed before merge.

These are not suggestions. The AI is instructed to refuse commits that violate them, to escalate uncertainty, and to ask rather than assume.


Pillar 2: the team

Claude Code supports specialist subagents. Instead of asking one model to do everything, MetaScope uses ten:

AgentRoleWhen it runs
plan-architectProduces implementation plansBefore any medium or larger feature
code-validatorChecks quality gatesBefore every commit
github-commit-agentStages, commits, pushesAfter validation passes
pr-review-agentSenior code reviewBefore merge
documentation-maintainerKeeps docs synchronizedAfter feature completion
milestone-trackerUpdates project plansAfter feature completion
macos-swift-architectArchitecture guidanceWhen design questions arise
macos-swift-debuggerDebugging expertiseWhen stuck past one attempt
code-auditorPattern consistencyBefore major refactors
release-notes-generatorDocuments releasesAfter feature completion

The critical detail is automatic invocation. The validator is not something we remember to call. It runs before every commit. The documentation maintainer triggers when a feature lands. Discipline happens without willpower.

The chain looks like this:

code-validator → [milestone-tracker | doc-maintainer] → commit-agent → pr-review-agent → merge

It mirrors how a professional team operates, with none of the meetings.


Pillar 3: memory

A conversation that forgets everything on reload is not a teammate. It is an intern you rehire every morning. Two mechanisms solved this for MetaScope.

A persistent knowledge graph (via an MCP memory server) captures decisions, patterns, and architectural choices. New sessions begin by pulling relevant context. Any decision worth keeping gets written to the graph, not just discussed.

Session checkpoints get written before context compaction, capturing the exact state of work in progress (branch, last commit, completed items, key decisions, next steps). The next session reads the checkpoint and picks up cleanly.

No decision is truly lost. Nothing important has to be re-derived.


Pillar 4: the skill guide

AI navigates unfamiliar codebases by reading files. If the files are well-organized, it moves fast. If they are not, it wastes tool calls on archaeology.

MetaScope has a skill guide, a curated map of the repo: where code lives, what patterns to follow, how systems connect, what decisions were made and why. When a new capability enters the project (a subagent, an MCP server, a refactor), the guide updates.

This is the single highest-leverage document in the repo. Get it right and velocity compounds.


What this looks like in a real feature

A concrete walkthrough: adding batch watermarking to MetaScope.

Planning. The plan-architect agent produces a short, specific plan: goal, approach, phases with time estimates, dependencies, tests, acceptance criteria. Reviewed, adjusted, and only then does implementation begin.

Implementation. Each phase goes through the code-validator, which returns something like:

## Validation Report

Status: FAIL

Blocking
- Force unwrap at WatermarkEngine.swift:47, use guard let
- Missing tests for watermark positioning

Non-blocking
- WatermarkConfiguration missing /// documentation

Ready to commit: NO

The commit is blocked until the issues are resolved. No exceptions.

Documentation. Phase completion triggers the documentation-maintainer: release notes updated, help content added, feature matrix synced, milestone checked off. Nothing written by hand.

Review. Before merge, the pr-review-agent performs a senior review: critical issues, important issues (should fix), suggestions, positive observations, verdict. Important issues become tracked items and get fixed before merge.

What is missing from this loop is the thing most AI-assisted projects get stuck on: me, remembering to run things. The loop runs itself.


Results after six months

MetaScope shipped as a professional metadata editor with:

  • Zero production crashes traced to force unwraps
  • 100+ documents kept synchronized with the code
  • 12 RFCs and 11 ADRs capturing major decisions
  • A comprehensive test suite that gates prevent regressions against
  • 100+ sessions of continuous context, with decisions preserved

The qualitative changes matter more:

Reduced cognitive load. I do not track what documentation needs updating. I do not remember every architectural choice. The framework handles it.

Consistent quality regardless of my state. Tired, rushed, or in flow, the gates still apply. Quality stopped depending on my moment-to-moment focus.

Faster onboarding. Every new feature starts with context. The skill guide navigates the codebase. Patterns come with examples.

Knowledge preservation. Six months later, I can still reconstruct why a particular pattern was chosen. It is in memory. It is in the RFCs. It is in the decision log.


Frequently asked

Is this framework specific to macOS or Swift? The content is platform-specific. The structure (hub document, specialist agents, quality gates, persistent memory, skill guide) works on any stack. We have used the same shape on React apps, backend services, and mobile projects.

Do I need all ten agents on day one? No. Start with three: a hub document (CLAUDE.md), a code-validator, and a commit agent. Add specialists as needs emerge.

Is this just “prompt engineering” with extra steps? No. Prompt engineering is how you speak to a model in one turn. This is about the system around many turns over many sessions: what gets remembered, what gets enforced, what gets delegated, what gets documented. The prompt quality still matters. It is not where the compounding value is.

What is the relationship between Armature and MetaScope? MetaScope is a product built with Armature. Armature is the system. We sell Armature-powered engagements to other teams who need to ship serious AI-native software without accumulating technical debt at speed.


The shift this represents

This framework is not about AI writing code for us. It is about AI as a force multiplier for practices we already know work, practices most teams skip because they are tedious: planning before coding, reviewing every commit, documenting as we go, keeping quality gates that actually block.

The AI is not replacing developers. It is removing the friction that keeps us from doing what we know we should. Get the framework right, and everything else follows.


For engineers: a quick-start checklist

If you want to adopt this pattern on your own project:

  • Create a CLAUDE.md with your non-negotiable gates
  • Define a code-validator that runs before every commit
  • Define a github-commit-agent for consistent commit formatting
  • Set up a .claude/checkpoints/ directory and write one per session
  • Build a skill guide that maps the repo
  • Configure MCP servers for your platform (Swift, Python, your language of choice) and for persistent memory
  • Document core architecture before the third feature lands
  • Write your first RFC for a medium feature
  • Run the loop for two weeks, then iterate based on friction

Start with structure. Add sophistication as patterns emerge.


MetaScope is a professional metadata editor for macOS, available on the Mac App Store. Want to ship serious software with the same framework? Explore Armature-powered engagements.