Context Architecture: Building the Library Your Agents Actually Need

Your AI agent isn’t stupid. It’s lost.

The most common failure I see in agentic systems isn’t a reasoning failure or a coding error. It’s a context failure. The agent didn’t have the right information at the right time, so it guessed. And the guess looked plausible enough that nobody caught it until production.

Context architecture is the skill of building structured data environments so that AI agents can reliably search, find, and retrieve exactly the information they need — without getting confused by dirty data, missing context, or irrelevant noise. Anthropic’s engineering team put it plainly: “Claude is already smart enough. Intelligence is not the bottleneck. Context is.”

This is the Dewey Decimal System of the AI era. And it might be the most undervalued skill in the market right now.

The Problem

When you’re the only person working on a project, context lives in your head. You know where the database schema is defined. You know why that migration was tricky. You know which deployment steps are manual and which are automated.

Agents don’t have your head.

An agent starts every session with nothing but what you give it. If you don’t structure the information it needs — what’s persistent, what’s per-session, what’s searchable, what’s off-limits — the agent will either ask you (burning time) or guess (burning trust).

By March 2026, I was managing 13 projects with over 400 documents spread across seven directories. Plans, wrap-ups, QA reports, strategy documents, implementation specs. The information existed, but it was scattered. An agent working on the iOS app couldn’t find the design system documented in the Rails repo. An agent writing a blog post couldn’t reference the product positioning in the marketing docs. The knowledge was there. The architecture to surface it wasn’t.

Three Layers of Context

Through months of iteration, I’ve landed on a three-layer model for agent context. Every piece of information falls into one of these layers, and each layer has different persistence, scope, and retrieval characteristics.

Layer 1: Persistent Context (Always There)

This is information the agent needs on every single session, regardless of the task. It loads automatically.

In my system, this means:

CLAUDE.md files — one per repository, ranging from 50 lines to 803 lines, encoding architecture rules, conventions, tool permissions, and behavioral boundaries. The agent reads this the moment it opens a project directory.
Memory files — typed persistent notes (user profile, decisions log, project context, external references) with YAML frontmatter for relevance filtering. These carry institutional knowledge across sessions.
A memory index — a single file that lists all available memories with one-line descriptions so the agent can decide which to load without reading them all.

The design principle: persistent context should be curated, not comprehensive. Loading everything is as bad as loading nothing. If the context window fills with irrelevant information, the agent’s attention dilutes and output quality degrades — Chroma’s context rot research proved this happens well before you hit the token limit.

Layer 2: Session Context (Task-Specific)

This is information the agent needs for a particular task but not permanently. It gets pulled in on demand.

My approach: structured skills with cognitive mode preambles. When I invoke /review-cycle, the skill loads PR review guidelines, severity classification rules, and the GraphQL queries for fetching unresolved threads. When I invoke /dispatch, it loads the project map, agent isolation rules, and the dispatch report template. Each skill is a curated context package for a specific type of work.

The key insight: session context should be assembled, not searched. If the agent has to search for the information it needs to do a specific task, you’ve already lost. The skill specification should gather and inject that context before the agent starts working.

Layer 3: Retrieval Context (On Demand)

This is the deep knowledge base — hundreds of documents that might be relevant depending on the question. The agent searches this layer when it needs information that isn’t in the persistent or session layers.

I use qmd , an on-device search engine built by Tobi Lütke, indexed across seven project directories. It runs BM25 keyword search, vector semantic search, and LLM re-ranking — all locally, no data leaving my machine. 428 documents, 5,220 chunks, three local AI models.

When I ask “how does photo sync work across web and iOS?” the agent doesn’t need to grep through files. It issues a semantic query and gets the most relevant passages in milliseconds. The architecture matters more than the tool — what makes this work is that the documents are consistently structured (standard directories, naming conventions, frontmatter), so the search engine can index them cleanly.

Microsoft Research’s GraphRAG takes this further by combining vector search with structured knowledge graphs that preserve entity relationships. The insight is the same: flat retrieval isn’t enough for complex questions. The structure of your data IS the context architecture.

What Good Context Architecture Produces

When all three layers work together, the agent operates with a form of institutional memory. It knows:

What the project is (persistent layer — CLAUDE.md)
What it’s supposed to do right now (session layer — skill specification)
Where to find deeper information when needed (retrieval layer — semantic search)

The result is that agents can work on tasks across my portfolio without me re-explaining the same context every session. An agent dispatched to work on the iOS app reads the iOS CLAUDE.md, understands the MVVM architecture and Apple Swift Testing conventions, knows to target the main branch, and can search across all project documentation if it encounters a cross-cutting question.

Before I built this, I spent the first 10-15 minutes of every session re-establishing context. Now I spend zero. That’s not a minor efficiency gain when you’re running multiple agent sessions per day across 13 projects.

The Hub-and-Spoke Model

The organizational pattern that emerged is hub-and-spoke. Command Center — my portfolio dashboard project — is the hub. Each project repo is a spoke. The hub has:

The ecosystem CLAUDE.md that maps all projects and their relationships
The persistent memory files that carry cross-project knowledge
The global skills that operate across the portfolio
The semantic search engine that indexes all documentation

Each spoke has:

Its own project-specific CLAUDE.md with local conventions
Its own project-specific skills (e.g., /ticket for the Rails app, /feature-complete for the iOS app)
Its own standard document structure (docs/wrap-ups/, docs/plans-to-do/, docs/plans-done/)

When I’m working in a specific project, the agent has deep local context. When I’m in Command Center, the agent has broad portfolio context. The two never conflict because the architecture separates them cleanly.

Where People Go Wrong

The three most common context architecture failures I’ve seen:

Too much context. Loading everything into the system prompt or context window. Sounds safe; actually degrades quality. Every model shows measurable accuracy drops as input context grows — the “lost in the middle” effect is real, with 30%+ accuracy drops in some benchmarks. Curate aggressively.

Dirty data. Stale documents, outdated specifications, deprecated instructions that contradict current behavior. An agent that finds conflicting information will pick one — and you won’t know which until something breaks. My memory files have a type system and descriptions specifically so stale entries can be identified and pruned.

No separation of concerns. Mixing persistent instructions with task-specific instructions with retrieval results in one big blob. The agent can’t distinguish “always do this” from “do this for this task” from “here’s some background that might be relevant.” The three-layer model exists to make these boundaries explicit.

The Honest Part

Context architecture is maintenance work. It’s not glamorous. CLAUDE.md files drift from reality as the codebase evolves. Memory files accumulate stale entries. The search index needs refreshing when new documents are added.

I’ve built maintenance into the daily workflow — the /morning skill refreshes the search index automatically, and memory files have frontmatter that flags their age. But it’s still work. The system doesn’t maintain itself.

The payoff is that the maintenance cost scales logarithmically while the value scales linearly. Adding a 14th project to a well-architected context system is trivial. Adding a 14th project to a system with no context architecture means re-explaining everything from scratch, every time, in every session.

Who Already Has This Skill

If you’re a librarian, you’ve spent your career organizing information so that people can find exactly what they need without knowing exactly where it is. That’s context architecture.

If you’re a technical writer, you structure documentation with clear hierarchies, consistent naming, and cross-references. That transfers directly.

If you’re a data engineer, you build pipelines that transform raw data into queryable, structured formats. Same skill, different consumer — agents instead of dashboards.

If you’re a knowledge manager at any organization, you’ve been doing context architecture for humans. The translation to doing it for AI agents is shorter than you think.

Anthropic’s engineering blog describes context engineering as the defining skill of the AI era. I’d refine that: it’s not just about engineering the context for a single interaction. It’s about architecting the context system so that any agent, on any task, in any project, can find what it needs without you being there to point the way.

That’s the unlock. Not smarter models. Better libraries.

This is part of a series on AI-era skills. Previously: The One-Person Engineering Team , Eliminating Waste in the SDLC , The System That Built Itself , and The Enterprise of One .