A few weeks ago, we wrote about the Armature origin story, the framework we used to ship MetaScope in six months. It worked. The app is on the App Store.
But shipping is not the hard part. Sustaining quality under speed is.
After a stretch of intense development, familiar failure modes started showing up: files ballooning, long sessions getting muddy, the same corrections resurfacing a week later. If we did nothing, “AI-assisted speed” would quietly become “technical debt at AI speed.”
Armature 2.0 is what we built to stop that drift. It sits on top of the original framework and adds three habits that prevent debt, stay sharp during long work, and remember corrections so they do not keep costing time.
Where the original framework ran out of runway
The v1 Armature framework solved the obvious problems: context loss, inconsistent quality, documentation drift. It introduced specialist agents, quality gates, and persistent memory. Good enough to ship MetaScope. Not good enough to keep shipping without slow entropy.
Four things showed up at scale:
Debt still accumulated. Even with reviews and gates, files kept growing. One SwiftUI view hit 6,635 lines. A service blew past 80 methods. The framework could detect problems. It did not reliably prevent them.
Context got saturated. Long sessions degraded. After enough tool calls, the model would loop: circular reasoning, vague suggestions, repeated exploration. The model was not “worse.” It was overloaded.
Learning did not persist. Corrections were local. “Use CustomButton not HTMLButton” got fixed in the moment, and then resurfaced a week later. The knowledge lived in my head, not in the system.
Agents proliferated. Ten specialists became noisy. Overlapping responsibilities turned every task into a routing decision.
v1 helped us build the product. v2 is about keeping it maintainable while the pace stays high.
Three habits, each solving one failure mode
Armature 2.0 rests on three pillars:
- Guardrails. Enforcement that blocks debt before it enters the codebase.
- Context engineering. Deliberate session compaction to keep output sharp.
- Self-improving memory. A workflow that turns corrections into durable rules.

Pillar 1: guardrails
Quality gates catch problems after they exist. Guardrails stop problems from getting merged in the first place.
The limits
Every metric has two thresholds: a soft cap (warning) and a hard cap (blocking).
| Metric | Soft cap | Hard cap | Enforcement |
|---|---|---|---|
| Lines per file | 400 | 800 | Reviewer warns, validator blocks |
| Lines per SwiftUI view | 300 | 600 | Reviewer warns, validator blocks |
| Types per file | 5 | 12 | Reviewer warns, validator blocks |
| Nested subviews | 3 | 6 | Reviewer warns, validator blocks |
@State per view | 10 | 20 | Reviewer warns, validator blocks |
| Methods per type | 20 | 35 | Reviewer warns, validator blocks |
The net reduction rule
If a PR touches a file that is already over soft cap, the PR must reduce the file’s size by 5 to 10 percent. That single rule shifts the default from “debt pauses” to “debt shrinks.” Every small fix becomes a small cleanup.
Exceptions that expire
Sometimes a limit must be exceeded. The framework handles this with structured, expiring exceptions:
// metascope:exception(metric:lines, value:1200, reason:"Complex multi-tab view pending refactor", issue:234, expires:"2026-02-28")
struct SettingsView: View {
// ...
}
An exception requires an issue reference, an expiry date, and a concrete reason. The auditor tracks expired exceptions and reports them. No exception lives forever.
When guardrails trigger, the framework offers remediation playbooks rather than warnings: view extraction, service decomposition, moving static data out of Swift into resources. The work is procedural, not creative.
Pillar 2: context engineering
Long sessions do not fail loudly. They fail quietly, the conversation gets heavier, the model becomes more repetitive, progress slows.
The smart zone
Shorter, cleaner context produces sharper output, especially during debugging and architectural decisions. When sessions get long, dead ends and speculative threads pollute the working set.
Armature 2.0 adds a compaction protocol that triggers when:
- a debug session exceeds 10 tool calls without resolution
- agent attempts fail repeatedly
- exploratory research completes
- a complex implementation phase is about to begin
- the conversation starts to feel circular
What compaction produces
Instead of dragging the whole transcript forward, compaction produces a truth-grounded artifact:
## Context compaction: Photos XPC connection recovery
Session focus: Fix Photos library becoming unresponsive after sleep
Compaction reason: Debug session >15 tool calls
### Ground truth
| File | Line | Finding |
| --------------------------------- | ---: | ------------------------------------------------------------- |
| `PhotosConnectionManager.swift` | 145 | Callback does NOT fire when daemon dies during sleep |
| `ThumbnailImageLoader.swift` | 312 | Error check missing `com.apple.accounts` pattern |
### What does not work
- Single-request probes: pass even when bulk requests fail
- Waiting for `photoLibraryDidBecomeUnavailable`: callback unreliable
### Recommended next step
Implement bulk probe (5+ concurrent requests) in `handleAppDidBecomeActive()`.
Every claim points to file:line. Failed approaches are recorded with the reason they failed. The next session starts with signal, not noise.
Pillar 3: self-improving memory
The most expensive mistake is not a bug. It is paying for the same correction repeatedly.
Armature 2.0 formalizes learning through a /reflect skill that runs at the end of meaningful sessions. It scans for explicit corrections (“No, use X instead”), approvals (“That is exactly right”), and repeated patterns in accepted code.
| Confidence | Source | Example |
|---|---|---|
| High | Explicit rules, negatives | ”Never do X”, “Always use Y” |
| Medium | Patterns that worked | Code that was accepted and merged |
| Low | Observations | Inferred preferences to review |
High-confidence learnings become rules. Medium become patterns. Low become observations to review and prune.
Durable, version-controlled
Learned preferences live in .claude/memory/learned-preferences.md, which is git-tracked. If a learning turns out to be wrong, you revert it. The system improves without turning into an irreversible mess.
The loop is short and tight:
Correction in session →
/reflectdetects → proposes update → I confirm → memory persists → future sessions apply the rule → fewer corrections.
The consolidated agent system
Ten agents became seven with cleaner boundaries:
| Agent | Role | Model | Trigger |
|---|---|---|---|
| architect | Design docs, implementation plans | Opus | Before medium+ features |
| validator | Automated quality checks | Sonnet | Before any commit |
| docs-sync | Documentation updates | Sonnet | After feature or fix completion |
| committer | Git operations | Sonnet | After validator passes |
| reviewer | Human-style code review | Opus | Before merge |
| debugger | Build, runtime, or test failures | Opus | After a failed debug attempt |
| auditor | Pattern consistency audits | Sonnet | Before major refactors |
Several agents folded into docs-sync with three modes: SYNC (update docs after a feature lands), AUDIT (clean stale docs, validate folder hygiene), COPYWRITER (generate blog posts and marketing copy).
The goal was not fewer agents for its own sake. It was less ambiguity.
What changed in practice
Four days with the evolved framework, the early signals:
| Metric | Framework 1.0 | Framework 2.0 |
|---|---|---|
| Files over hard cap | 15+ | 0 (with tracked exceptions) |
| Average file size | ~560 lines | <400 lines |
| Session context saturation | Frequent | Rare |
| Repeated corrections | Common | Declining |
| Documentation drift | Occasional | None observed |
Numbers come from a four-day internal snapshot: validator guardrail scans and /reflect output tallies on the MetaScope repo, comparing the week before and after v2 landed.
Qualitatively, four things changed:
- Sustainable velocity. The codebase stops getting worse as we ship.
- Reduced cognitive load. The system carries the corrections and constraints.
- Predictable quality. Standards do not depend on my mood.
- Institutional memory. Decisions become durable artifacts instead of retroactive archaeology.
Frequently asked
Is this a different framework from v1, or the same one evolved? Same framework, evolved. Armature 2.0 adds three pillars (guardrails, context engineering, self-improving memory) on top of the original hub document, agent system, and persistent memory. Teams already on v1 adopt 2.0 incrementally.
Do I have to use all of it? No. If you adopt only one idea, make it guardrails with exceptions that expire. It is the smallest change that reliably bends a codebase back toward health, and it sets you up for compaction and memory once the project gets real.
Is Armature something you sell, or just a framework you use internally? Both. Armature powers every Zalo Design Studio acceleration engagement. When we work with a client, we bring Armature with us, adapt it to the target platform, and leave the team with a working system.
What is the relationship to MetaScope? MetaScope is the product we built with Armature v1 and continue to maintain with v2. It is the proof case: shipping a professional macOS app in six months and keeping it healthy as we ship updates.
For engineers: incremental adoption
If you already have a CLAUDE.md with quality gates, a basic agent structure, and persistent memory, layer these on:
- Guardrails first. Define soft and hard caps and enforce them. Add the net-reduction rule for files already over soft cap.
- Compaction protocol. Start producing truth-grounded artifacts whenever a session gets long or muddy.
/reflect. Run it after meaningful sessions. Promote high-confidence corrections to durable rules.- Consolidate agents. Remove overlap. Clarify triggers. Aim for one role per agent and one agent per moment.
If you are starting fresh, do the v1 basics first, hub document, three core agents (validator, committer, reviewer), and MCP memory, then add guardrails after the first thousand lines, compaction after your first long debug spiral, and /reflect after the first correction you never want to repeat.
The shift this represents
Armature 2.0 is a response to entropy.
Without guardrails, complexity creeps in. Without compaction, context degrades. Without reflection, lessons evaporate. Armature 2.0 is process that maintains itself, guardrails that prevent debt, compaction that protects session quality, reflection that turns corrections into reusable rules.
That is the thing we bring to every Armature-powered engagement: not just the speed, but the system that keeps the speed from costing you later.
MetaScope is available on the Mac App Store. Considering AI-native development at speed, with a framework that prevents debt instead of just detecting it? Explore Armature-powered engagements.