The System That Built Itself
I didn’t set out to build a system. I set out to survive.
For the better part of a year, I’ve been operating as a one-person engineering team — across my day job and the products I’m building on my own. I wrote about what that looks like in The One-Person Engineering Team, and about the engineering practices that emerged in Eliminating Waste in the SDLC. But something else happened that I didn’t fully recognize until I stopped and looked at what I’d built.
I asked Claude Code to audit itself. To show me what it sees from the inside — every configuration file, every custom command, every persistent memory file, every agent definition. What started as a developer adding a few convenience scripts had grown, organically, into a layered operating system for how I work.
The Audit
Here’s what exists today, across my projects:
- 10 active project directories with persistent cross-session memory
- 6 custom slash commands automating my engineering lifecycle — from morning briefing to ticket creation to a 15-point quality gate
- A custom AI agent with its own persona, brand voice guidelines, prohibited vocabulary, and 12KB of accumulated knowledge
- 802 lines of project instructions encoding engineering standards, a design system, and quality expectations
- 140+ pre-approved tool permissions carefully scoped per project
- A unified command center spanning all projects, deployed to a custom domain behind Cloudflare authentication
- A daily AI intelligence briefing running automatically at 6AM
Add it up and it’s not a collection of scripts. It’s an operating system — one that emerged without a blueprint, without a design document, without anyone deciding it should exist.
Every piece exists because I needed it on a specific day, for a specific problem.
The Layers
The system operates at four levels, each one built on top of the last.
Layer 1: Global Configuration. Model selection, plugin management, permissions, and the daily workflow commands that apply everywhere. My /morning command queries a GitHub Projects board, groups work items by status across five repositories, shows recent PR activity, and suggests where to focus. I don’t open a browser to figure out what to work on. I type /morning and read.
Layer 2: Project Instructions. Each project has a CLAUDE.md file — a constitution for how work gets done in that codebase. My primary project’s file is 802 lines. It covers the tech stack, architectural patterns, a complete design system with eight visualization types and a semantic color palette, the issue-as-specification workflow, and feature completion criteria. When I or any AI agent works in that codebase, the standards are embedded in the environment, not in my head.
Layer 3: Custom Commands and Agents. This is where it gets interesting. Six slash commands replace what used to be manual team processes:
/morning— morning standup, but with a machine that actually read every ticket and PR overnight/plan-tomorrow— a three-phase planning workflow that reviews today, asks me four questions, and generates tomorrow’s plan from live project data/ticket— transforms a conversational description into a fully-specified GitHub issue with Gherkin acceptance criteria, test plans, and quality checklists/verify— a 15-point quality gate that checks acceptance criteria coverage, scans for SQL injection, detects N+1 queries, runs linters, executes tests, and produces a unified pass/fail dashboard/review-pr— structured code review with security analysis/wrap-up— generates comprehensive documentation for completed work
I also have a specialized marketing agent running on a different model, with its own persistent memory. It knows my brand voice, my company’s origin story, the competitive landscape, and which phrases are banned from our copy. When I need customer-facing content, I don’t re-explain the brand guidelines. The agent already knows.
Layer 4: Persistent Memory. This is the layer that makes everything else compound. Every project has memory files that persist across sessions — production gotchas I’ve encountered, architectural decisions and their rationale, API quirks, performance constraints. When I return to a project after weeks away, the context is already loaded. I don’t re-learn my own codebase.
How It Grew
None of this came from a planning document. The morning briefing command exists because one morning I was tired of opening four browser tabs to figure out what needed my attention. The marketing agent exists because I kept re-explaining the same brand guidelines to a general-purpose model. The memory files exist because I kept re-discovering the same production gotchas.
The command center is the clearest example. I needed to manage work across multiple projects — a Rails web app, an iOS app, a macOS desktop app, a Hugo site, and several experimental projects. I evaluated Jira. Too expensive, too much overhead for a solo operator. I evaluated Linear. Same problem — it assumes a team model. GitHub Projects was free but underpowered out of the box.
So I built my own. GitHub Projects became the structured data store. Claude Code became the natural language interface for querying and updating it. Custom slash commands became the workflow layer. And then I realized I wanted a visual dashboard — something simple that would surface the four things that needed my attention right now. I had Claude spin up a Hugo static site, deployed it to a custom subdomain, and put Cloudflare authentication in front of it.
The whole thing — evaluation, decision, implementation, deployment, security — took one evening. With free tools. That’s not a story about AI being impressive. It’s a story about what happens when your build cost drops low enough that “just make exactly what you need” becomes the rational choice over adopting someone else’s product.
The system isn’t static, either. For a recent feature sprint on my macOS app, I needed four subsystems built in parallel — audio capture, thought recording, on-device AI integration, and journal synthesis. So I spun up a multi-agent team: a team lead coordinating four development tracks, each with its own agent, all committing to the same branch. When the features shipped to production, I retired the team and deleted the files. Temporary infrastructure, purpose-built for a specific objective, torn down when the objective was met. That pattern — stand it up, ship it, clean it up — is how the whole system works.
And lately, I’ve noticed something that stopped me cold: Claude has started doing this on its own. Without explicit prompting, it will spin up a team of agents and delegate work across them. Things I had to orchestrate manually a few weeks ago are now decisions the tool makes autonomously. The system isn’t just building itself anymore — it’s starting to run itself.
The Timing
This week, Andrej Karpathy released autoresearch — a minimal setup where AI agents autonomously iterate on ML training code, committing improvements over git branches. It’s being discussed as a breakthrough in accessible agentic experimentation. The same week, Claude’s downloads surged past ChatGPT on both app stores. Agentic workflows are moving from research concept to mainstream tool.
I’ve been living in that transition for months — not in a research context, but in a production one. The gap between what frontier AI researchers describe as the future and what a motivated practitioner is already doing has collapsed. The thought leaders are articulating what the practitioners are living. And the practitioners — the ones who’ve been in the trenches for months, not weeks — have something valuable to share: not just what’s possible, but what actually works under production pressure.
What It Changed
The system didn’t just change how I work. It changed what I spend my time thinking about.
I used to think about implementation. How to structure a controller, how to optimize a query, how to wire up a test suite. Those thoughts still happen, but they happen inside the system now — encoded in project instructions, caught by quality gates, remembered across sessions. The system absorbed the mechanical expertise and freed up the space it used to occupy in my head.
What filled that space surprised me. I think about product decisions more carefully than I ever have. I think about what’s worth building and what isn’t. I think about specification — writing the description of what I want precisely enough that the system can execute it faithfully. The bottleneck shifted, and I shifted with it. The scarce resource isn’t engineering anymore. It’s judgment.
That’s the thing nobody told me about building with these tools: the system doesn’t just give you leverage on the work. It changes which work is yours to do.
The Honest Part
It’s not seamless. Memory files occasionally need pruning. Commands need updating as projects evolve. The marketing agent sometimes drifts from the brand voice and needs correction. Worktree sessions leave orphaned files that require cleanup. The system requires maintenance, just like any system.
But the maintenance cost is trivial compared to the alternative, which is doing everything manually, or not doing it at all. The system exists because I couldn’t afford a team and refused to lower my standards.
Necessity, it turns out, is still the best architect.
This is the third in a series on AI-augmented engineering. Previously: The One-Person Engineering Team and Eliminating Waste in the SDLC.