Building macOS Apps with an AI Development Team: A Framework That Actually Works

How we built a structured AI development framework using specialized agents, quality gates, and persistent memory to build MetaScope, a professional macOS app in 6 months.

December 30, 2025

Geert Zaal

Featured

Building macOS Apps with an AI Development Team: A Framework That Actually Works

Six months ago, I started a journey to see exactly how far AI could push my development workflow. From the early days of chat-based debugging to deep-diving into Cursor and OpenAI’s Codex, I’ve tested the limits of what’s possible. Today, I’m back in the CLI with Claude Code, and the transformation is night and day.

The result of this “ride” is MetaScope, a professional-grade macOS app built for the demanding needs of photographers. By leveraging the incredible improvements made in AI over the last six months, I’m now shipping at lightning velocity.

But code is only half the battle. To reach this speed without sacrificing quality, I had to stop treating AI as a chatbot and start treating it as a structured development team. Here is how I built that framework.

This is the story of that framework, and how you can use it for your own projects.

The Problem: AI Is Capable But Chaotic

Early in the MetaScope project, I noticed a pattern. Claude could write excellent Swift code. It understood SwiftUI patterns, macOS conventions, and even obscure ExifTool quirks. But there were problems:

Context Loss Every conversation started fresh. Decisions made in previous sessions evaporated. We’d discuss architectural choices, agree on a pattern, and then, next session, the same discussion would happen again.

Inconsistent Quality Some days, the code was brilliant. Other days, shortcuts appeared. Force unwraps. Missing error handling. Undocumented APIs. There was no consistent standard.

Documentation Drift The code would change, but documentation wouldn’t. Help content described features that no longer existed. Release notes missed important updates. Architecture docs became archaeology.

Scope Creep “While we’re here, let’s also…” became a recurring theme. Features would expand. Technical debt accumulated. Releases slipped.

Sound familiar? These aren’t AI problems, they’re development problems. And they have development solutions.

The Solution: A Framework, Not a Chat

The breakthrough came when I stopped treating AI as a conversation partner and started treating it as a development team member. Team members have:

AI Development Framework Architecture

Defined responsibilities (not everything is everyone’s job)
Quality standards (gates that work must pass through)
Institutional knowledge (decisions persist across time)
Structured communication (reports, documents, not just chat)

Here’s what the framework looks like:

The Hub: CLAUDE.md

Every project needs a central guidance document. Not a README (that’s for humans reading GitHub), but an operational playbook for the AI. Mine includes:

## MANDATORY GATES (BLOCKING)

### Planning Gate
Create documented plan BEFORE any code changes.

| Scope | Document |
|-------|----------|
| Major (>5 days) | RFC in docs/developer/decisions/ |
| Medium (2-5 days) | Implementation plan |
| Small (<2 days) | Commit message plan |

### Quality Gate
- No force unwraps in production code
- Error cases handled with user feedback
- Public APIs have documentation
- No TODO without issue reference

### Testing Gate
Tests MUST pass before commits.

### PR Review Gate
All review feedback addressed before merge.

These aren’t suggestions. They’re blocking requirements. The AI is instructed to refuse commits that violate gates, to escalate uncertainty, and to ask rather than assume.

The Team: Specialized Subagents

Here’s where it gets interesting. Claude Code supports “subagents”, specialized AI instances with focused expertise. Instead of asking one AI to do everything, I defined ten specialists:

Agent	Role	When Invoked
plan-architect	Creates implementation plans	Before any medium+ feature
code-validator	Checks quality gates	Before every commit
github-commit-agent	Stages, commits, pushes	After validation passes
pr-review-agent	Senior code review	Before merge
documentation-maintainer	Keeps docs synchronized	After feature completion
milestone-tracker	Updates project plans	After feature completion
macos-swift-architect	Architecture guidance	When design questions arise
macos-swift-debugger	Debugging expertise	When stuck for >1 attempt
code-auditor	Pattern consistency	Before major refactors
release-notes-generator	Documents releases	After feature completion

The key insight: automatic invocation. I don’t ask Claude to run the validator, it runs automatically before every commit. I don’t request documentation updates, the documentation-maintainer triggers when features complete.

This creates a workflow that mirrors professional development teams:

code-validator → [milestone-tracker | doc-maintainer] → commit-agent → pr-review-agent → merge

The Memory: Persistent Knowledge

Context loss is solved through two mechanisms:

MCP Memory Server A knowledge graph that persists across sessions. Decisions, patterns, and architectural choices live here. When a session starts:

mcp__memory__search_nodes("v1.2.1 current work")

The AI retrieves relevant context from previous sessions. When decisions are made:

mcp__memory__add_observations({
  entityName: "window-sizing-pattern",
  contents: ["All windows minimum 480x400", "Use frameAutosaveName for persistence"]
})

They persist for future sessions.

Session Checkpoints Before context compaction (when the conversation gets too long), state is captured:

# Session Checkpoint - 2024-12-24

## Git State
- Branch: dev/v1.2.1
- Last commit: f4bf180

## Completed This Session
- Export settings audit
- CLLocationManager fix

## Key Decisions
- Using synchronous loading for export folder to prevent nil on first render

## Next Steps
1. Window sizing pattern audit
2. Multi-file drag export

The next session starts by reading this checkpoint. No decision is truly lost.

The Expertise: Swift MCP Server

For macOS development, there’s a specific challenge: Apple’s APIs change constantly. New frameworks appear. Old patterns become deprecated. Training data has a cutoff date.

The solution is an MCP server specifically for Swift documentation. It has access to:

Latest Apple API documentation (including macOS 26/Liquid Glass features)
Swift Evolution proposals
Human Interface Guidelines
Swift Programming Language book

When the AI needs API information:

mcp__swift__swift_symbol_lookup("NSWindow")

It gets current documentation, not potentially outdated training knowledge.

The Framework in Practice

Let me walk through a real scenario: adding batch watermarking to MetaScope.

Phase 1: Planning (plan-architect)

I describe the feature. Instead of jumping to code, the plan-architect agent produces:

# Implementation Plan: Batch Watermarking

## Goal
Allow users to apply text or image watermarks to multiple files simultaneously.

## Approach
Extend existing BatchProcessingService with watermark operations.

## Phases
1. Watermark configuration UI (2 days)
2. Single-file watermark engine (2 days)
3. Batch integration (1 day)
4. Preview system (1 day)

## Dependencies
- CoreImage for watermark compositing
- Existing BatchProcessingService
- Export settings for output format

## Tests
- Unit: Watermark positioning calculations
- Integration: Batch processing with watermarks
- UI: Preview matches final output

## Acceptance Criteria
- Users can configure text/image watermarks
- Watermarks apply to 100+ files in reasonable time
- Preview accurately represents final result

This plan is reviewed. Adjusted if needed. Then, and only then, does implementation begin.

Phase 2: Implementation (with quality gates)

As I implement each phase, the code-validator runs before every commit:

## Validation Report

**Status**: FAIL

### Blocking Issues
- Force unwrap at WatermarkEngine.swift:47, use guard let
- Missing tests for watermark positioning

### Non-Blocking Issues
- WatermarkConfiguration missing /// documentation

**Ready to commit**: NO

The commit is blocked until issues are resolved. No exceptions.

Phase 3: Documentation (documentation-maintainer)

When Phase 1 completes, the documentation-maintainer automatically triggers:

Updates release notes with “Added watermark configuration”
Adds help content for the watermark settings panel
Updates the feature matrix in developer docs
Syncs the milestone plan checkbox

I don’t write documentation manually. I don’t even request it. The agent knows the feature is complete (from the milestone tracker) and updates all documentation layers.

Phase 4: Review (pr-review-agent)

Before merging, the PR review agent performs a senior review:

## PR Review Report

### Critical Issues
(none)

### Important Issues (Should Fix)
- WatermarkEngine.applyWatermark() doesn't handle zero-size images

### Suggestions
- Consider caching CIFilter instances for batch performance

### Positive Observations
- Clean separation between configuration and application
- Proper Task cancellation handling

**Verdict**: APPROVE (with suggested fix)

The important issue creates a TodoWrite item. It’s tracked and fixed before merge.

Results: What This Framework Delivers

After 6 months of development:

Quantifiable Outcomes

Metric	Observation
Code Quality	Zero production crashes from force unwraps
Documentation	100+ synchronized docs, always current
Planning	12 RFCs, 11 ADRs for major decisions
Testing	Comprehensive test suite, gates prevent regression
Session Continuity	Decisions persist across 100+ sessions

Qualitative Outcomes

Reduced Cognitive Load I don’t track what documentation needs updating. I don’t remember every architectural decision. The framework handles it.

Consistent Quality Whether I’m tired, rushed, or in flow state, the gates apply. Quality doesn’t depend on my current focus.

Knowledge Preservation Six months later, I can understand why we chose a particular pattern. It’s documented. In Memory. In RFCs.

Faster Onboarding New features start with context. The skill guide navigates the codebase. Patterns are documented with examples.

How to Adopt This Framework

Start Small

You don’t need all ten agents on day one. Start with:

CLAUDE.md, Define your quality gates
code-validator, Automated quality checks before commits
github-commit-agent, Consistent commit formatting

Add agents as needs emerge.

Customize for Your Stack

This framework was built for macOS/Swift. The structure is universal:

The hub document (CLAUDE.md)
Specialized agents with clear triggers
Quality gates that block progress
Persistent knowledge management
Multi-layer documentation

The content adapts to your technology:

React? Create a component-patterns agent
Backend? Create an API-design agent
Mobile? Create platform-specific debuggers

Invest in the Skill Guide

The navigation skill is underrated. When the AI can quickly find:

Where code lives
What patterns to follow
How systems connect
What decisions were made

…development velocity increases dramatically.

Enforce the Gates

Gates only work if they’re enforced. Configure your CLAUDE.md to make violations blocking. Train yourself to not override them “just this once.”

The Bigger Picture

This framework represents a shift in how we think about AI-assisted development. It’s not about AI writing code for us. It’s about AI as a force multiplier for professional practices we already know work:

Planning before coding (now with an architect agent that never forgets patterns)
Code review (now with a reviewer that checks every commit, not just some)
Documentation (now with a maintainer that never gets lazy)
Quality gates (now with a validator that doesn’t make exceptions)

The AI isn’t replacing developers. It’s eliminating the friction that prevents us from doing what we know we should: plan thoroughly, review carefully, document completely, maintain quality consistently.

Conclusion

Building MetaScope with this framework taught me that AI-assisted development isn’t about how smart the AI is, it’s about how well you structure its work. A brilliant developer without process produces chaos. A capable AI with strong process produces quality.

The framework isn’t magic. It’s discipline, encoded. It’s quality gates that can’t be bypassed. It’s documentation that can’t be forgotten. It’s decisions that can’t be lost.

If you’re building macOS applications (or any complex software) with AI assistance, consider this: the bottleneck isn’t AI capability. It’s AI orchestration. Get the framework right, and everything else follows.

MetaScope is a professional metadata editor for macOS, available on the Mac App Store. Built with Swift, SwiftUI, in 6 months of AI-assisted development.

Appendix: Quick Start Checklist

For developers who want to adopt this framework:

Create CLAUDE.md with mandatory gates
Define code-validator agent
Define github-commit-agent
Set up .claude/checkpoints/ directory
Create project skill guide
Configure MCP servers (Swift, Memory)
Document core architecture
Create first RFC for a medium feature
Run for two weeks, iterate based on friction points

The framework will evolve with your project. Start with structure, add sophistication as patterns emerge.

Questions or feedback? Reach out to Zalo Design Studio.