Fixed agent roles vs dynamic spawning: when do explicit specialists actually help, and when are

I've been running a multi-agent setup for personal/dev work and I keep going back and forth on one design choice: fixed roles vs dynamic agent spawning.

The setup I'm currently using has four explicit roles:

Lead/orchestrator - decides who does what, synthesizes the final answer

Explorer - gathers context from files, repos, docs, external sources

Consultant - reviews plans, weighs tradeoffs, catches mistakes before edits

Executor - makes concrete changes: file edits, shell commands, artifacts

The argument for fixed roles is mostly about scope. "One generalist agent with every tool" tends to mix concerns - the same prompt that's gathering context is also tempted to start editing files, and review steps get skipped because the agent is already mid-action. Splitting roles forces a handoff at each stage, which makes mistakes more visible.

The argument against is that fixed roles can become ceremony. If the task is small, delegation is overhead. If handoff protocols are weak, agents repeat each other. If memory is stale, the whole team can confidently drift in the wrong direction. Tiny bureaucracy, now with tokens.

A few patterns I've found useful when fixed roles do work:

Explorer never writes files. The boundary is enforced by tool access, not just prompt instruction.

Consultant runs before Executor on anything destructive. Skipping the review when the model is "confident" is exactly when you want it.

Executor gets a narrow tool set. It doesn't get web access; that's Explorer's job.

The lead synthesizes. Letting every agent talk to the user produces a noisy transcript.

What I'm still unsure about:

Where's the threshold? For a one-line code change, the full team is overkill. For a multi-file refactor, it's clearly worth it. The middle is fuzzy.

Dynamic spawning sounds clean in theory but I haven't seen it produce stable behavior - agents spawn agents, depth gets weird, hard to debug.

Memory between roles is the part I keep getting wrong. Either too much shared context (Executor "remembers" things Explorer never said) or too little (Consultant reviews without seeing what Explorer found).

Tool-call reliability is the real bottleneck for the Executor role specifically. Smaller models can pass single-call tests and still drift on 3–5 step sequences.

Question for people who've shipped multi-agent systems:

Do explicit role boundaries hold up as your system gets more capable, or do they collapse into "one strong agent + a handful of tools" once the underlying model is good enough?

Also curious where you draw the line between "useful specialist" and "unnecessary extra LLM call that just adds latency."

(I'll drop a link to the project I'm building this in as a comment - sub rule says no body links.)

[留言]

为什么值得关注

能改变理解方式，而不只是重复常识；有直接可用的方法、工具或操作价值；它提供了新的理解或解释，而不只是表面观点

来源：reddit，领域：tech，保留分：0.67