How we stopped AI-assisted speed from turning into technical debt

A few weeks ago, we wrote about the Armature origin story, the framework we used to ship MetaScope in six months. It worked. The app is on the App Store.

But shipping is not the hard part. Sustaining quality under speed is.

After a stretch of intense development, familiar failure modes started showing up: files ballooning, long sessions getting muddy, the same corrections resurfacing a week later. If we did nothing, “AI-assisted speed” would quietly become “technical debt at AI speed.”

Armature 2.0 is what we built to stop that drift. It sits on top of the original framework and adds three habits that prevent debt, stay sharp during long work, and remember corrections so they do not keep costing time.

Where the original framework ran out of runway

The v1 Armature framework solved the obvious problems: context loss, inconsistent quality, documentation drift. It introduced specialist agents, quality gates, and persistent memory. Good enough to ship MetaScope. Not good enough to keep shipping without slow entropy.

Four things showed up at scale:

Debt still accumulated. Even with reviews and gates, files kept growing. One SwiftUI view hit 6,635 lines. A service blew past 80 methods. The framework could detect problems. It did not reliably prevent them.

Context got saturated. Long sessions degraded. After enough tool calls, the model would loop: circular reasoning, vague suggestions, repeated exploration. The model was not “worse.” It was overloaded.

Learning did not persist. Corrections were local. “Use CustomButton not HTMLButton” got fixed in the moment, and then resurfaced a week later. The knowledge lived in my head, not in the system.

Agents proliferated. Ten specialists became noisy. Overlapping responsibilities turned every task into a routing decision.

v1 helped us build the product. v2 is about keeping it maintainable while the pace stays high.

Three habits, each solving one failure mode

Armature 2.0 rests on three pillars:

Guardrails. Enforcement that blocks debt before it enters the codebase.
Context engineering. Deliberate session compaction to keep output sharp.
Self-improving memory. A workflow that turns corrections into durable rules.

Framework 2.0 Architecture: the three pillars of Guardrails, Context Engineering, and Self-Improving Memory working together to maintain code quality

Pillar 1: guardrails

Quality gates catch problems after they exist. Guardrails stop problems from getting merged in the first place.

The limits

Every metric has two thresholds: a soft cap (warning) and a hard cap (blocking).

Metric	Soft cap	Hard cap	Enforcement
Lines per file	400	800	Reviewer warns, validator blocks
Lines per SwiftUI view	300	600	Reviewer warns, validator blocks
Types per file	5	12	Reviewer warns, validator blocks
Nested subviews	3	6	Reviewer warns, validator blocks
`@State` per view	10	20	Reviewer warns, validator blocks
Methods per type	20	35	Reviewer warns, validator blocks

The net reduction rule

If a PR touches a file that is already over soft cap, the PR must reduce the file’s size by 5 to 10 percent. That single rule shifts the default from “debt pauses” to “debt shrinks.” Every small fix becomes a small cleanup.

Exceptions that expire

Sometimes a limit must be exceeded. The framework handles this with structured, expiring exceptions:

// metascope:exception(metric:lines, value:1200, reason:"Complex multi-tab view pending refactor", issue:234, expires:"2026-02-28")
struct SettingsView: View {
    // ...
}

An exception requires an issue reference, an expiry date, and a concrete reason. The auditor tracks expired exceptions and reports them. No exception lives forever.

When guardrails trigger, the framework offers remediation playbooks rather than warnings: view extraction, service decomposition, moving static data out of Swift into resources. The work is procedural, not creative.

Pillar 2: context engineering

Long sessions do not fail loudly. They fail quietly, the conversation gets heavier, the model becomes more repetitive, progress slows.

The smart zone

Shorter, cleaner context produces sharper output, especially during debugging and architectural decisions. When sessions get long, dead ends and speculative threads pollute the working set.

Armature 2.0 adds a compaction protocol that triggers when:

a debug session exceeds 10 tool calls without resolution
agent attempts fail repeatedly
exploratory research completes
a complex implementation phase is about to begin
the conversation starts to feel circular

What compaction produces

Instead of dragging the whole transcript forward, compaction produces a truth-grounded artifact:

## Context compaction: Photos XPC connection recovery

Session focus: Fix Photos library becoming unresponsive after sleep
Compaction reason: Debug session >15 tool calls

### Ground truth

| File                              | Line | Finding                                                       |
| --------------------------------- | ---: | ------------------------------------------------------------- |
| `PhotosConnectionManager.swift`   |  145 | Callback does NOT fire when daemon dies during sleep          |
| `ThumbnailImageLoader.swift`      |  312 | Error check missing `com.apple.accounts` pattern              |

### What does not work

- Single-request probes: pass even when bulk requests fail
- Waiting for `photoLibraryDidBecomeUnavailable`: callback unreliable

### Recommended next step

Implement bulk probe (5+ concurrent requests) in `handleAppDidBecomeActive()`.

Every claim points to file:line. Failed approaches are recorded with the reason they failed. The next session starts with signal, not noise.

Pillar 3: self-improving memory

The most expensive mistake is not a bug. It is paying for the same correction repeatedly.

Armature 2.0 formalizes learning through a /reflect skill that runs at the end of meaningful sessions. It scans for explicit corrections (“No, use X instead”), approvals (“That is exactly right”), and repeated patterns in accepted code.

Confidence	Source	Example
High	Explicit rules, negatives	”Never do X”, “Always use Y”
Medium	Patterns that worked	Code that was accepted and merged
Low	Observations	Inferred preferences to review

High-confidence learnings become rules. Medium become patterns. Low become observations to review and prune.

Durable, version-controlled

Learned preferences live in .claude/memory/learned-preferences.md, which is git-tracked. If a learning turns out to be wrong, you revert it. The system improves without turning into an irreversible mess.

The loop is short and tight:

Correction in session → /reflect detects → proposes update → I confirm → memory persists → future sessions apply the rule → fewer corrections.

The consolidated agent system

Ten agents became seven with cleaner boundaries:

Agent	Role	Model	Trigger
architect	Design docs, implementation plans	Opus	Before medium+ features
validator	Automated quality checks	Sonnet	Before any commit
docs-sync	Documentation updates	Sonnet	After feature or fix completion
committer	Git operations	Sonnet	After validator passes
reviewer	Human-style code review	Opus	Before merge
debugger	Build, runtime, or test failures	Opus	After a failed debug attempt
auditor	Pattern consistency audits	Sonnet	Before major refactors

Several agents folded into docs-sync with three modes: SYNC (update docs after a feature lands), AUDIT (clean stale docs, validate folder hygiene), COPYWRITER (generate blog posts and marketing copy).

The goal was not fewer agents for its own sake. It was less ambiguity.

What changed in practice

Four days with the evolved framework, the early signals:

Metric	Framework 1.0	Framework 2.0
Files over hard cap	15+	0 (with tracked exceptions)
Average file size	~560 lines	<400 lines
Session context saturation	Frequent	Rare
Repeated corrections	Common	Declining
Documentation drift	Occasional	None observed

Numbers come from a four-day internal snapshot: validator guardrail scans and /reflect output tallies on the MetaScope repo, comparing the week before and after v2 landed.

Qualitatively, four things changed:

Sustainable velocity. The codebase stops getting worse as we ship.
Reduced cognitive load. The system carries the corrections and constraints.
Predictable quality. Standards do not depend on my mood.
Institutional memory. Decisions become durable artifacts instead of retroactive archaeology.

Frequently asked

Is this a different framework from v1, or the same one evolved? Same framework, evolved. Armature 2.0 adds three pillars (guardrails, context engineering, self-improving memory) on top of the original hub document, agent system, and persistent memory. Teams already on v1 adopt 2.0 incrementally.

Do I have to use all of it? No. If you adopt only one idea, make it guardrails with exceptions that expire. It is the smallest change that reliably bends a codebase back toward health, and it sets you up for compaction and memory once the project gets real.

Is Armature something you sell, or just a framework you use internally? Both. Armature powers every Zalo Design Studio acceleration engagement. When we work with a client, we bring Armature with us, adapt it to the target platform, and leave the team with a working system.

What is the relationship to MetaScope? MetaScope is the product we built with Armature v1 and continue to maintain with v2. It is the proof case: shipping a professional macOS app in six months and keeping it healthy as we ship updates.

For engineers: incremental adoption

If you already have a CLAUDE.md with quality gates, a basic agent structure, and persistent memory, layer these on:

Guardrails first. Define soft and hard caps and enforce them. Add the net-reduction rule for files already over soft cap.
Compaction protocol. Start producing truth-grounded artifacts whenever a session gets long or muddy.
/reflect. Run it after meaningful sessions. Promote high-confidence corrections to durable rules.
Consolidate agents. Remove overlap. Clarify triggers. Aim for one role per agent and one agent per moment.

If you are starting fresh, do the v1 basics first, hub document, three core agents (validator, committer, reviewer), and MCP memory, then add guardrails after the first thousand lines, compaction after your first long debug spiral, and /reflect after the first correction you never want to repeat.

The shift this represents

Armature 2.0 is a response to entropy.

Without guardrails, complexity creeps in. Without compaction, context degrades. Without reflection, lessons evaporate. Armature 2.0 is process that maintains itself, guardrails that prevent debt, compaction that protects session quality, reflection that turns corrections into reusable rules.

That is the thing we bring to every Armature-powered engagement: not just the speed, but the system that keeps the speed from costing you later.

MetaScope is available on the Mac App Store. Considering AI-native development at speed, with a framework that prevents debt instead of just detecting it? Explore Armature-powered engagements.