Anthropic shipped two significant upgrades to its Managed Agents platform last week that change how businesses can deploy AI for complex work. Both Outcomes and Multiagent Orchestration moved to public beta on May 6, announced at the company’s Code with Claude developer event in San Francisco.
If you have been running Claude agents and finding that output quality is inconsistent, or that single-agent setups hit walls on complex tasks, these updates are worth understanding.
Outcomes: Agents That Check Their Own Work
The core problem with AI agents in production is reliability. An agent can produce output that looks reasonable but fails silently against your actual requirements. Outcomes is Anthropic’s answer to that.
You define what a successful result looks like by writing a rubric, a structured markdown document with explicit, gradeable criteria. When the agent finishes its work, a separate Claude instance reads the output and evaluates it against your rubric in an isolated context window. That grader has no visibility into how the agent reached its answer, only whether the result meets the bar.
If the output fails any criteria, the grader returns specific feedback on what needs to change and the agent takes another pass. This loop repeats until the rubric is satisfied or the iteration limit is hit. The default is three iterations, and you can set it as high as twenty.
In Anthropic’s own benchmarks, using Outcomes improved task success rates by up to 10 percentage points over a standard prompting loop. The gains were largest on the hardest tasks. For specific deliverable types, the numbers were: +8.4 percentage points on .docx file outputs, and +10.1 percentage points on .pptx presentations.
That might sound modest, but if you are running agents at scale, the difference between a 75% success rate and an 85% success rate across hundreds of tasks compounds fast.
The rubric design matters. Vague criteria produce noisy evaluations. Criteria like “The CSV contains a price column with numeric values” work better than “the data looks good.” Anthropic’s guidance is to take a known-good example output, ask Claude to analyze what makes it good, and turn that analysis into rubric criteria.
Multiagent Orchestration: Specialist Teams on Demand
The second feature addresses a different problem: scope. Some tasks are simply too wide for a single agent to handle well.
With Multiagent Orchestration (also in public beta), you configure a lead coordinator agent with a roster of specialist agents. The coordinator breaks the task into pieces and delegates each one to the right specialist. Specialists run in parallel, each with their own context window, model configuration, system prompt, and tool access.
The coordinator can list up to 20 unique specialist agents in its roster, and up to 25 threads can run concurrently per session. Each thread is persistent, so the coordinator can follow up with a specialist it called earlier and that agent remembers its prior work.
The patterns this enables include:
Parallelization: Fan out independent subtasks simultaneously. A coordinator investigating a software incident can send subagents to search deploy history, error logs, metrics, and support tickets at the same time, then synthesize the findings.
Specialization: Route to agents built for specific domains. A legal review coordinator can send contract clauses to a compliance agent, financial terms to a risk agent, and data handling provisions to a privacy agent, each with the right tools and expertise for their piece.
Escalation: When a subagent hits something genuinely complex, it can hand off to a more capable model rather than struggling through.
The agents share a container and filesystem, so they can collaborate on files. Tools and context are not shared across agents, which keeps each specialist focused and reduces noise.
What This Means for Business
These two features together close the biggest gaps that hold businesses back from trusting AI agents with consequential work.
The quality problem is real. Businesses deploying agents often compensate for unreliable output with heavy human review, which defeats much of the efficiency gain. Outcomes shifts that burden back to the agent system. You spend time writing a rubric once, and the agent handles iteration.
The scope problem is equally real. Complex business processes do not fit inside a single agent’s context. A procurement workflow touches vendor data, contract terms, budget approval, and supplier communication. A customer issue investigation spans CRM, billing, product logs, and support history. Single-agent approaches either miss pieces or get overwhelmed. Multiagent orchestration gives you a practical architecture for these workflows.
For businesses already using Anthropic’s Managed Agents platform, both features are available now under the managed-agents-2026-04-01 beta header. Webhooks were also added in the same release, which means you can now kick off an outcome-driven session and get notified asynchronously when the agent finishes rather than polling for status.
The broader theme here is that Anthropic is building the infrastructure for agents that do real work, not just demos. Self-verification and coordinated specialist teams are table stakes for any business process where the output actually matters.
What This Means for Enterprise DNA Clients
If you are building AI workflows through Omni Ops, the ability to define explicit success criteria and have agents verify their own output before delivering results is directly relevant to the reliability of any automated process we build. We track releases like this closely as part of how we design agent architecture for our clients.
For teams that want to understand how these agent patterns apply to their operations, our Omni Advisory service covers exactly this kind of architecture decision.
Source
Anthropic Platform