Insight ai

Cut Multi-Agent Costs 50% With Decentralized Architecture

Sam McKay 19 June 2026

Stanford’s DeLM research landed last month with a claim that stopped a lot of people mid-scroll: decentralized multi-agent workflows cut task costs by 50% compared to traditional orchestrated designs. No central coordinator. No hub routing every decision. Just agents talking directly to each other, and the API bill drops in half.

If you run a consulting firm and you’ve been experimenting with custom AI workflows, this matters more than you might think. Most firms building agent systems default to a hub-and-spoke model because it feels intuitive. One orchestrator agent receives the request, farms out subtasks to specialist agents, collects the results, and assembles the final output. It’s tidy. It’s also expensive.

The DeLM paper shows that when agents communicate peer-to-peer, you eliminate the orchestrator tax. Every task that used to route through a central brain now flows directly between the agents that need to collaborate. Fewer API calls. Faster execution. Half the cost. For a consulting firm running proposal generation, research synthesis, and knowledge management workflows at scale, that 50% reduction compounds fast.

The Orchestrator Tax You’re Paying Right Now

Most firms don’t notice the cost structure until they’ve scaled past the pilot phase. You build a proposal agent that pulls past case studies, pricing, and client context into a draft. It works. You add a research agent that runs structured industry analysis at the start of every engagement. That works too. Then you connect them through an orchestrator so the proposal agent can request research on demand.

The orchestrator makes 12 calls for a single proposal. It interprets the request, routes to the research agent, waits for a response, reformats the data, passes it to the proposal agent, monitors progress, handles errors, and assembles the final document. Each of those steps is a token-burning API call. You’re paying for coordination overhead that doesn’t add value to the output.

A decentralized design lets the proposal agent call the research agent directly. One request. One response. The proposal agent already knows what it needs, so it asks for it in the format it can use. No middleman. No reformatting tax. No orchestrator interpreting and reinterpreting context at every handoff.

For a firm running 40 major proposals a year, that difference adds up. If each proposal burns $18 in API costs under an orchestrated model, you’re spending $720 annually on that workflow alone. Cut it to $9 per proposal with decentralized architecture, and you’ve saved $360. Now multiply that across every workflow in your firm. Research briefs. Knowledge retrieval. Client reporting. The savings stack.

We typically see consulting firms in the $2M to $10M range running between 60 and 150 agent-assisted tasks per month once they move past pilots. At orchestrated costs, that’s $1,200 to $2,700 per month in API spend. Decentralized architecture brings it down to $600 to $1,350. Over a year, that’s $7,200 to $16,200 back in the budget, and that’s before you account for the speed gains that let you take on more work with the same team.

Why Consulting Firms Default to Orchestration

The hub-and-spoke model isn’t a mistake. It’s a reasonable first instinct when you’re building something new. Orchestrators give you visibility. You can log every decision, monitor every handoff, and debug failures from a single control plane. When you’re still figuring out what agents you need and how they should interact, that centralized view feels safer.

The problem is that most firms never revisit the architecture once it’s working. The orchestrator becomes infrastructure. You add more agents, more workflows, more complexity, and the orchestrator grows into a bottleneck. It’s handling routing logic for 12 different agent types, each with its own input requirements and output formats. The code gets messy. The token costs climb. The latency stretches.

Decentralized architecture flips that model. Each agent knows its own capabilities and the capabilities of the agents it works with. The proposal agent doesn’t need an orchestrator to tell it where to find research. It knows the research agent exists, it knows how to call it, and it does. The knowledge agent doesn’t wait for a coordinator to decide whether a question is relevant. It evaluates the query itself and returns an answer or passes the request to a specialist.

This isn’t chaos. It’s delegation. The same way your consulting team doesn’t route every question through a single partner, your agent network doesn’t need a central brain making every decision. Agents that know their domain can make local decisions faster and cheaper than a generalist orchestrator ever could.

What Decentralized Architecture Looks Like in Practice

Let’s walk through a real workflow. A partner at your firm gets an inbound request for a market entry strategy in the logistics sector. Under an orchestrated model, the request hits the orchestrator, which interprets the ask, routes it to the proposal agent, waits for the agent to identify research gaps, sends those gaps to the research agent, collects the findings, passes them back to the proposal agent, monitors assembly, and outputs the draft.

In a decentralized model, the proposal agent receives the request directly. It recognizes that it needs logistics sector analysis and competitive benchmarking. It calls the research agent with a structured query: “Logistics sector, North America, last 18 months, focus on mid-market entrants.” The research agent runs the analysis and returns a summary with sources. The proposal agent pulls relevant case studies from the knowledge agent in parallel. No waiting. No sequential handoffs. Both requests happen at once because there’s no orchestrator queuing tasks.

The proposal agent assembles the draft using the research summary and case studies. Total API calls: three. One to the research agent, one to the knowledge agent, one to generate the final document. Under orchestration, that same workflow would burn seven to nine calls depending on how the orchestrator handles error checking and progress monitoring.

The time savings matter as much as the cost. Orchestrated workflows run sequentially because the coordinator has to wait for each step to complete before routing the next task. Decentralized workflows run in parallel. The proposal agent doesn’t care whether the research agent finishes before the knowledge agent. It collects both results as they arrive and moves forward. A proposal that took 90 seconds under orchestration now completes in 35 seconds. That’s not a marginal improvement when you’re running 40 proposals a year and each one involves multiple revision cycles.

The Knowledge Management Unlock

The cost savings on proposal generation are easy to quantify, but the bigger win is what decentralized architecture does for knowledge management. Most consulting firms treat knowledge as a retrieval problem. You build an agent that can search past decks and documents, and you call it when someone needs an answer. That works for explicit queries, but it misses the real opportunity.

A decentralized knowledge agent doesn’t wait to be asked. It listens to every workflow in your firm and surfaces relevant context automatically. When the proposal agent starts drafting a logistics market entry strategy, the knowledge agent sees the query to the research agent and checks whether the firm has done similar work before. If it finds a past engagement with overlapping scope, it pushes that case study to the proposal agent without being asked.

That’s not possible under orchestration. The orchestrator controls the flow, so the knowledge agent only gets involved when the orchestrator decides it’s relevant. By the time the orchestrator realizes the proposal agent might benefit from historical context, the research is already done and the draft is half-written. The knowledge agent becomes a lookup tool instead of an active participant.

Decentralized architecture turns the knowledge agent into a collaborator. It sees the same requests the other agents see, it evaluates relevance in real time, and it contributes without waiting for permission. Over time, that changes how the firm captures and reuses IP. Instead of knowledge management being a manual tagging and filing exercise, it becomes an automatic byproduct of doing the work. Every proposal, every research brief, every client deliverable feeds the knowledge base, and every future workflow benefits without anyone lifting a finger.

We’ve seen firms reduce repeated research work by 30% to 40% once the knowledge agent starts contributing proactively. That’s 30% to 40% of the time senior people spend digging through past projects, reformatting old analysis, and recreating insights the firm already paid for once. For a firm billing $250 per hour for senior consultant time, every hour saved is $250 back in margin or capacity to take on new work.

If you want to see how this architecture applies to your specific workflows, book a 60-min Omni Audit. We’ll map your current manual processes, identify where orchestration overhead is costing you time and money, and show you what a decentralized design would look like in your firm. You’ll walk out with a workflow diagram, a cost breakdown, and a prioritized build list. No deck. No sales pitch. Just the three outputs you need to make a decision.

How to Shift From Orchestrated to Decentralized

If you’ve already built an orchestrated system, the good news is you don’t have to rebuild from scratch. The agents you’ve built still work. The logic is sound. You’re just changing how they talk to each other.

Start by mapping the current flow. Pick one workflow, ideally the one you run most often, and document every API call. Where does the orchestrator interpret input? Where does it route requests? Where does it reformat data? Where does it monitor progress? Each of those steps is a candidate for elimination.

Next, identify the direct relationships. Which agents actually need to communicate? The proposal agent needs the research agent and the knowledge agent. The research agent might need the knowledge agent to check for past analysis. The knowledge agent doesn’t need the proposal agent at all. It listens, but it doesn’t depend on it.

Rewrite the proposal agent to call the research agent directly. Give it the research agent’s API endpoint and the expected input format. Remove the orchestrator from that path. Test it. If the proposal agent can get research without the orchestrator, you’ve just eliminated three to four API calls per proposal.

Do the same for the knowledge agent. Instead of waiting for the orchestrator to decide when knowledge is relevant, configure the knowledge agent to listen to the research agent’s queries. When the research agent asks for logistics sector analysis, the knowledge agent sees that request and checks for past work in parallel. If it finds something, it pushes the result to the proposal agent directly. No orchestrator involved.

The orchestrator doesn’t disappear entirely. You still need something to handle error logging, monitor system health, and manage agent discovery when you add new capabilities. But it’s no longer in the critical path. It’s infrastructure, not a bottleneck.

This is the kind of architecture shift we walk through in the AI audit for consulting firms. We look at your existing workflows, identify orchestration overhead, and design a decentralized alternative that cuts costs without breaking what’s already working. Most firms see a clear path to 40% to 50% cost reduction within the first 30 minutes of the audit.

What This Means for Your API Budget

Let’s put numbers to it. A typical consulting firm running agent-assisted workflows at scale might be spending $1,800 per month on API costs under an orchestrated model. That’s 150 tasks per month at an average of $12 per task. Decentralized architecture brings the per-task cost down to $6. Same 150 tasks, $900 per month. You’ve just freed up $10,800 per year.

That’s not speculative. We’re seeing firms hit those numbers within 60 to 90 days of shifting to decentralized designs. The savings show up immediately because you’re making fewer API calls for the same output. There’s no ramp period. There’s no training overhead. You flip the architecture, and the cost drops.

The time savings take a bit longer to materialize because you have to adjust how people interact with the system. Under orchestration, people got used to sequential workflows. They’d submit a request, wait for research, review the findings, then ask for a proposal draft. Decentralized workflows run in parallel, so the draft shows up faster than they expect. It takes a few cycles for people to trust that the system isn’t skipping steps. It’s just doing them concurrently.

Once that trust builds, the time savings compound. Proposals that used to take 90 seconds now take 35. Research briefs that took three minutes now take 80 seconds. Knowledge retrieval that required a manual search now happens automatically in the background. Over a year, that adds up to 40 to 60 hours of senior time reclaimed. At $250 per hour, that’s $10,000 to $15,000 in capacity you can redeploy to billable work or business development.

If you want a practical guide to deploying this kind of architecture in your firm, grab the Deploy Your First Business Agent worksheet. It walks through agent selection, API design, and the specific steps to move from orchestrated to decentralized workflows without disrupting your current operations. It’s the same framework we use with firms in the Omni build process.

The Bigger Picture: Architecture as Leverage

The DeLM research is interesting because it quantifies something most firms feel intuitively once they’ve been running agents for a few months. Orchestration overhead is real. It slows things down. It costs money. And it doesn’t add value proportional to the cost.

Decentralized architecture isn’t just a cost optimization. It’s a design philosophy that mirrors how your consulting team actually works. Partners don’t route every question through a managing director. Senior consultants don’t wait for permission to pull in a specialist. People who know their domain make decisions locally and collaborate directly with the people who can help them deliver.

Your agent network should work the same way. The proposal agent knows proposals. The research agent knows research. The knowledge agent knows your firm’s history. Let them talk to each other without a middleman interpreting every exchange. The system gets faster, cheaper, and more resilient because you’ve eliminated the single point of failure that orchestrators inevitably become.

This is the kind of architectural thinking we bring to every Omni build. We don’t just automate your current process. We redesign the workflow to take advantage of what agents do well: parallel execution, direct communication, and local decision-making. The result is a system that costs half as much to run and delivers output twice as fast.

If you’re ready to see what that looks like for your firm, book my Omni Audit. Sixty minutes. Three outputs. No deck. We’ll map your workflows, calculate your current orchestration tax, and show you the decentralized alternative. You’ll know exactly what it costs, what it saves, and what it takes to build. Then you decide whether it makes sense for your firm.

For more on how AI is reshaping consulting operations, explore the insights library or dive into the Omni platform overview to see the full range of capabilities we build for firms at your stage.

Enterprise DNA Resources