I built a multi-agent Claude Code workflow with persistent memory for a 2,300 KB Python codebase

Background: I maintain a Python CLI tool with a FastAPI backend. Around 2,300 KB of code, 1,450+ tests across Python 3.9/3.11/3.12/3.13. This isn't a vibe-coding setup. It's a structured multi-agent workflow I've been refining for months.

Here's the full breakdown.

The architecture One orchestrator agent receives a structured prompt and decides which specialized agent to delegate to. It only fires once. After that, the delegated agent takes over completely.

The first thing every delegated agent does is a shallow analysis pass. It never acts immediately. This single rule eliminated most of the "Claude went off and did the wrong thing" problems.

The quality layer This is where most setups stop at one reviewer. I have nine specialized agents:

- Architect : reviews structural and architectural decisions before anything is built

- Strategist: reviews the proposed approach against project goals

- Code reviewer : reviews the actual implementation

- API structure reviewer : checks endpoint design, contracts, consistency

- General quality reviewer : reviews completed tasks against the original requirement

- Code consistency reviewer: checks naming, patterns, style against the existing codebase

- Frontend reviewer

- Backend reviewer

- Security reviewer

Not every agent runs on every task. The orchestrator decides which reviewers are relevant based on what was changed.

The memory system This is the part I rarely see discussed.

Claude has no memory between sessions by default.

I built a persistent memory layer that runs outside Claude Code itself. At the end of every task, and at the end of every session, an agent generates a structured summary. What was done, what was decided, what was explicitly rejected and why.

These summaries feed into a graph of memory nodes with typed relationships. Not a flat log. An actual graph where nodes connect across sessions, across tasks, and across projects. The system understands that a decision made in project A is relevant to a pattern emerging in project B.

Two things make this different from a simple notes system:

Memory decay. Older nodes lose weight over time unless they're reinforced by new sessions.

Decisions that haven't been relevant in a while fade. The ones that keep coming up stay prominent.

Semantic search. When a new session starts, the relevant memory nodes are retrieved by semantic similarity to the current task, not by keyword match. Claude starts each session with context that's actually relevant, not just recent.

I built this before "second brain" became a trend. The architecture is closer to how biological memory works than how note-taking apps work.

What still doesn't work well Cross-file refactoring across more than 4-5 files degrades quality even with this setup. I break these into sequential tasks.

Implicit architectural decisions that predate the memory system are still a risk. If it wasn't documented in a session summary, the agents don't know about it.

The one thing that changed everything Test co…

为什么值得关注

能改变理解方式，而不只是重复常识；符合当前抓取需求；它提供了新的理解或解释，而不只是表面观点

来源：reddit，领域：tech，保留分：0.64