A few weeks ago, I wrote about building a structured AI development framework that treated AI like a development team, not a chatbot. It worked. MetaScope shipped.

But shipping isn’t the hard part. Sustaining quality under speed is. After a stretch of intensive development, we started seeing a familiar set of failure modes: files ballooning, long sessions getting muddy, and the same corrections resurfacing after a few days.

Framework 2.0 is what we built to stop that drift. It’s a system that prevents debt, stays sharp under long tool chains, and remembers corrections so they don’t keep costing you time.


The problem with Framework 1.0

Framework 1.0 solved the obvious problems: context loss, inconsistent quality, documentation drift. It introduced specialized agents, quality gates, and persistent memory.

Then the scale problems showed up.

Debt still accumulated

Even with reviews and gates, files kept growing. One SwiftUI view hit 6,635 lines. A service ballooned past 80 methods. The framework could detect problems, but it didn’t reliably prevent them.

Context saturation

Long sessions degraded. After enough tool calls, outputs started to loop: circular reasoning, vague suggestions, and repeated exploration. The model wasn’t “worse” - it was overloaded.

Learning didn’t persist

Corrections were local. “No, use CustomButton not HTML button” could be fixed in the moment, then resurface a week later. The knowledge lived in my head, not in the system.

Agent proliferation

Ten agents became noisy. Overlapping responsibilities led to uncertainty about who should do what, and when. The process began to cost attention.

Framework 1.0 helped build the product. Framework 2.0 is about keeping it maintainable while the pace stays high.


The solution: three core innovations

Framework 2.0 rests on three pillars:

  1. Guardrails: enforcement that blocks debt before it enters the codebase
  2. Context engineering: deliberate compaction to keep sessions in a “smart zone”
  3. Self-improving memory: a workflow that turns corrections into durable rules

Framework 2.0 Architecture: The three pillars of Guardrails, Context Engineering, and Self-Improving Memory working together to maintain code quality


Pillar 1: Guardrails

Quality gates catch problems after they exist. Guardrails stop problems from getting merged in the first place.

The limits

Every metric has two thresholds:

MetricSoft capHard capEnforcement
Lines per file400800Reviewer warns / Validator blocks
Lines per SwiftUI view300600Reviewer warns / Validator blocks
Types per file512Reviewer warns / Validator blocks
Nested subviews36Reviewer warns / Validator blocks
@State per view1020Reviewer warns / Validator blocks
Methods per type2035Reviewer warns / Validator blocks

Soft caps trigger warnings. The reviewer flags them as medium severity during PR review.

Hard caps are blocking. The validator refuses to proceed (ideally in CI and/or pre-commit), unless there’s an explicitly tracked exception.

The net reduction rule

If you touch a file that’s already over soft cap, your PR must reduce its size by 5–10%.

That shifts the default behavior from “debt pauses” to “debt shrinks.” Every small fix becomes a small cleanup.

Exception tracking

Sometimes limits must be exceeded. Framework 2.0 handles this with structured, expiring exceptions:

// metascope:exception(metric:lines, value:1200, reason:"Complex multi-tab view pending refactor", issue:234, expires:"2026-02-28")
struct SettingsView: View {
    // ...
}

Exceptions require:

  • an issue reference (issue:234)
  • an ISO date (YYYY-MM-DD)
  • a concrete reason (no “temporary” without a plan)

The auditor tracks expired exceptions and reports them. No exception lives forever.

Skills for remediation

When guardrails trigger, the framework provides playbooks:

  • swiftui-decomposition: systematic view extraction with scoring criteria
  • god-object-splitter: service decomposition patterns (facade, strategy, factory)
  • static-data-externalizer: moving content out of Swift into resources

These aren’t guidelines. They’re procedures: extraction criteria, proposed file trees, and migration steps.


Pillar 2: Context engineering

Long sessions don’t fail loudly. They fail quietly: the conversation gets heavier, the model becomes more repetitive, and progress slows.

Framework 2.0 treats that as a systems problem.

The “smart zone”

In practice, shorter, cleaner context produces sharper output, especially during debugging and architectural decisions. When sessions get long, dead ends and speculative threads pollute the working set.

Framework 2.0 adds a compaction-protocol skill that triggers when:

  • debug sessions exceed 10 tool calls without resolution
  • agent attempts fail repeatedly
  • exploratory research completes
  • you’re about to start a complex implementation phase
  • the conversation starts to feel circular or “muddy”

What compaction produces

Instead of dragging the whole transcript forward, compaction produces a truth-grounded artifact:

## Context compaction: Photos XPC connection recovery

**Session focus**: Fix Photos library becoming unresponsive after sleep
**Compaction reason**: Debug session >15 tool calls

### Ground truth

| File | Line | Factual finding |
| --- | ---: | --- |
| `PhotosConnectionManager.swift` | 145 | Callback does **not** fire when daemon dies during sleep |
| `ThumbnailImageLoader.swift` | 312 | Error check missing `com.apple.accounts` pattern |

### What doesn’t work

- ❌ Single-request probes: pass even when bulk requests fail
- ❌ Waiting for callback: `photoLibraryDidBecomeUnavailable` unreliable

### Recommended next step

Implement bulk probe (5+ concurrent requests) in `handleAppDidBecomeActive()`.

Every claim points to file:line. Failed approaches are recorded with the reason they failed. The next session starts with signal, not noise.

Compaction quality checklist

  • Every claim references file:line (no speculation)
  • Failed approaches documented with specific reasons
  • Files listed are actually relevant
  • Next step is single and actionable
  • Constraints are grounded in code, not assumptions

Pillar 3: Self-improving memory

The most expensive mistake isn’t a bug. It’s paying for the same correction repeatedly.

Framework 2.0 formalizes learning through a /reflect skill.

The /reflect skill

At the end of significant sessions, invoking /reflect triggers an analysis:

## Signals detected

### Corrections (high confidence)

1. "No, use CustomButton not HTML button"
   - Category: Components
   - Proposed rule: "Always use CustomButton for buttons in MetaScope"

2. "Check for SQL injections"
   - Category: Security
   - Proposed rule: "Validate all user input for SQL injection"

### Approvals (medium confidence)

1. Code using @StateObject was accepted
   - Category: SwiftUI patterns
   - Proposed rule: "Prefer @StateObject for view-owned state"

The system scans for explicit corrections (“No, use X instead”), approvals (“That’s exactly right”), and repeated patterns in accepted code.

Confidence levels

LevelSourceExample
HighExplicit rules, negatives“Never do X”, “Always use Y”
MediumPatterns that workedCode that was accepted/merged
LowObservationsInferred preferences to review

High-confidence learnings become rules. Medium-confidence become patterns. Low-confidence become observations that get reviewed and pruned.

Persistent, version-controlled memory

Learned preferences live in .claude/memory/learned-preferences.md:

# Learned preferences

## Rules (high confidence)

### Components

- Always use CustomButton for buttons in MetaScope (2026-01-08)
- Never create inline styles for buttons (2026-01-05)

### SwiftUI patterns

- Prefer @StateObject for view-owned state (2026-01-08)
- Use .task instead of .onAppear for async work (2026-01-07)

## Change log

| Date | Changes | Confidence |
| --- | --- | --- |
| 2026-01-08 | +3 rules | 2 high, 1 medium |

This file is git-tracked. If a learning is wrong, you can revert it. The system improves without turning into an irreversible mess.

The feedback loop

Correction in session → /reflect detects → proposes update → user confirms → memory persists → future sessions apply rule → fewer corrections

The consolidated agent system

Framework 2.0 reduces ten agents to seven, with cleaner boundaries:

AgentRoleModelTrigger
architectDesign docs, implementation plansOpusBefore medium+ features
validatorAutomated quality checksSonnetBefore any commit
docs-syncDocumentation updatesSonnetAfter feature/fix completion
committerGit operationsSonnetAfter validator passes
reviewerHuman-style code reviewOpusBefore merge
debuggerBuild/runtime/test failuresOpusAfter 1+ failed debug attempts
auditorPattern consistency auditsSonnetBefore major refactors

What changed

Several agents merged into docs-sync with three modes:

  • SYNC: update docs after feature completion
  • AUDIT: clean stale docs, validate folder hygiene
  • COPYWRITER: generate blog posts and marketing copy

The goal wasn’t fewer agents for its own sake. It was less ambiguity.

The workflow

validator → docs-sync → committer → [user approval] → reviewer → merge

Each agent has one job. Each triggers at a predictable moment. The system stays quiet unless it has a reason to speak.


Skills: reusable procedures

Framework 2.0 formalizes skills, versioned SOPs you invoke when needed:

SkillPurpose
git-workflowCommit format, branch naming, safety rules
code-review-checklistWhat to check, severity classification
documentation-sync-mapDoc locations, sync rules, hygiene
swift-patternsMetaScope conventions and anti-patterns
compaction-protocolContext compression to stay sharp
reflectSelf-improving memory system
codebase-guardrailsLimit enforcement before changes
swiftui-decompositionView extraction playbook
god-object-splitterService decomposition patterns
static-data-externalizerMoving content to resources
memory-auditPeriodic memory cleanup
copywriterBlog and marketing content templates
metascope-guideCodebase navigation and patterns

Skills don’t bloat every conversation. They’re called on demand, so context stays lean while procedures stay deep.


The MCP server ecosystem

Framework 2.0 uses focused MCP servers for specialized capabilities.

Swift documentation server

Apple’s APIs change constantly, and training data has a cutoff. A Swift MCP server can provide current references:

mcp__swift__swift_symbol_lookup("NSWindow")
mcp__swift__apple_docs_search(query: "window management")
mcp__swift__swift_evolution_lookup("async")
mcp__swift__hig_search("navigation")

Memory server

The MCP memory server provides a knowledge graph across sessions:

mcp__memory__search_nodes("v1.2.2 current work")
mcp__memory__add_observations({
  entityName: "photos-xpc-pattern",
  contents: ["Bulk probes required for reliable detection"]
})

Results: what Framework 2.0 delivers

After 4 days with the evolved framework, here are the early signals.

Quantifiable outcomes (internal snapshot)

MetricFramework 1.0Framework 2.0
Files over hard cap15+0 (with tracked exceptions)
Average file size~560 lines<400 lines
Session context saturationFrequentRare
Repeated correctionsCommonDeclining
Documentation driftOccasionalNone observed

How these were measured (briefly):

  • “Files over hard cap”: validator guardrail scan across the repo at the end of each day
  • “Average file size”: line counts across Swift files in the main module (repo snapshot)
  • “Repeated corrections”: manual tally from /reflect outputs + PR review notes
  • “Documentation drift”: docs-sync audit pass after merges

Qualitative outcomes

  • Sustainable velocity: the codebase stops getting worse as you ship.
  • Reduced cognitive load: the system carries the corrections and constraints.
  • Predictable quality: standards aren’t dependent on mood.
  • Institutional memory: decisions become durable artifacts.

How to adopt Framework 2.0

Start where Framework 1.0 left off

If you already have:

  • CLAUDE.md with quality gates
  • basic agent structure
  • persistent memory

Add these progressively:

  1. Guardrails first: define soft/hard caps and enforce them
  2. Compaction protocol: produce truth-grounded artifacts for long sessions
  3. Reflect: run /reflect after meaningful work
  4. Consolidate agents: remove overlap, clarify triggers

If starting fresh

  1. Create CLAUDE.md with mandatory gates and guardrails
  2. Define three core agents: validator, committer, reviewer
  3. Add MCP memory (and docs, if you can)
  4. Add guardrails after your first 1,000 lines
  5. Add compaction after your first long debug spiral
  6. Add /reflect after the first correction you never want to repeat

The key insight

Framework 2.0 isn’t “more process.” It’s process that maintains itself.

Guardrails prevent debt without manual tracking. Compaction protects session quality without discipline games. Reflection turns corrections into reusable rules.


Conclusion

Framework 2.0 is our response to entropy.

Without guardrails, complexity creeps in. Without compaction, context degrades. Without reflection, lessons evaporate.

If you adopt only one idea, make it this: guardrails with exceptions that expire. It’s the smallest change that reliably bends a codebase back toward health, and it sets you up for compaction and memory once the project gets real.