Insight ai

SAP Confirms Human Review Is the Bridge to Production AI

Sam McKay 16 June 2026

SAP published findings last week that should matter to every accounting firm running pilots on AI automation. The headline: enterprise AI agents are moving from proof-of-concept to production in finance workflows, but only when human review is baked into the design. Not bolted on afterward. Not optional. Mandatory checkpoints at decision points.

If you’re running a firm between one and twenty-five million in revenue, you’ve probably seen a demo or two by now. Maybe you’ve tested an AI tool for bank reconciliation or journal entry drafting. The demos look clean. The pilot works. Then you try to scale it across ten clients, and someone asks the question that stops everything: “Who’s liable if this thing posts the wrong number?”

That’s the gap SAP is naming. The technology works. The liability model doesn’t, unless you design approval gates into the workflow from day one.

Why pilots stall before they reach production

Most accounting firms test AI on a single client or a narrow task. Bank rec for one entity. Expense categorization for another. The tool runs, the numbers look right, and the partner signs off. Then the conversation shifts to rolling it out across the book.

That’s when three problems show up.

First, variance handling. The AI works beautifully when the data is clean and the pattern is familiar. It stumbles when a client switches payroll providers mid-month, or a bank feed drops a duplicate transaction, or an AP aging report includes a credit memo that didn’t clear. In a pilot, the associate catches it. In production, across fifty clients, you need a system that flags the edge case and routes it to someone who can decide.

Second, audit trail. Your client’s tax preparer, their lender, or an IRS examiner will eventually ask how a particular journal entry was generated. “The AI did it” isn’t an answer. You need a record of what the agent proposed, who reviewed it, what they changed, and why. That record has to be as clean as the one you’d produce if a senior accountant drafted the entry by hand.

Third, margin pressure. If every AI output requires the same level of review as a manual draft, you haven’t saved time. You’ve added a step. The only way the economics work is if the agent handles the routine cases end-to-end and escalates the exceptions. That means the review checkpoint has to be smart enough to distinguish between “this reconciliation is clean, approve it” and “this variance is outside normal range, a human needs to look.”

SAP’s research confirms what we’ve seen with firms deploying the AI audit for accounting and bookkeeping: the agents that make it to production are the ones designed with explicit approval gates. Not as a compliance checkbox. As the core of the workflow.

What a production-ready agent looks like in a close workflow

Let’s walk through a real example. Month-end close for a services client with fifteen employees, three bank accounts, and typical AP and AR volume. Manual process today takes an associate four to six hours. The partner reviews for another hour. Total cycle time is two days if nothing’s wrong, five days if there’s a variance that requires client follow-up.

A Month-End Close Agent built for production handles it differently.

The agent starts the moment the last business day of the month ends. It pulls bank feeds, AP aging, AR aging, payroll journal, and the prior month’s trial balance. It reconciles each bank account line by line. For transactions that match an open invoice or a cleared check, it posts automatically. For transactions that don’t match, it flags them and drafts a variance note: “Unmatched deposit, $3,200, memo field says ‘Refund from vendor.’ Likely relates to Invoice #4457 from March. Recommend journal entry to reverse accrual and close AP item.”

The agent doesn’t post that entry. It queues it for review.

Same pattern for payroll. The agent compares the payroll journal to the prior month and to the budget. If the variance is within ten percent and the headcount matches, it posts. If payroll is up twenty percent and headcount is flat, it flags it: “Payroll variance outside normal range. Possible overtime, bonus accrual, or data error. Requires partner review before posting.”

By the time the associate opens the close pack the next morning, seventy percent of the reconciliation is done. The agent has posted the routine entries, flagged the exceptions, and drafted the journal entries for the partner to review. The associate spends thirty minutes reviewing the flags, approves eight of them, escalates two to the partner, and closes the month by noon.

The partner’s review takes fifteen minutes. They look at the two escalated items, approve one, reject one and leave a note for the client. Total cycle time: four hours instead of two days. The audit trail is complete. Every auto-posted entry has a reconciliation note. Every flagged entry has a review timestamp and a decision.

That’s what SAP means by human-in-the-loop. The agent does the work. The human makes the calls the agent can’t make. The workflow is designed so the handoff is clean and the liability is clear.

The approval checkpoint isn’t a bottleneck if you design it right

The objection we hear most often is that adding a review step will slow things down. If the agent drafts a journal entry and then waits for a human to click approve, haven’t you just recreated the old queue?

Only if you design the checkpoint badly.

A good approval gate has three characteristics. First, it’s asynchronous. The agent doesn’t sit idle waiting for a human. It moves on to the next task. The human reviews a batch of flagged items once or twice a day, not in real time.

Second, it’s filtered. The agent only escalates items that actually need a decision. A reconciliation that matches to the penny doesn’t hit the queue. A reconciliation with a two-dollar rounding difference doesn’t hit the queue. A reconciliation with a five-thousand-dollar unmatched wire transfer does.

Third, it’s fast. The review interface shows the agent’s recommendation, the supporting data, and a one-click approve or reject. The human isn’t re-doing the work. They’re validating the agent’s logic and making the call on the edge case.

We worked with a firm in the Midwest that tested this model on twenty clients last quarter. They set the escalation threshold at five percent variance for any account and any transaction over five hundred dollars that didn’t match an open item. The agent handled eighty-two percent of reconciliation items end-to-end. The remaining eighteen percent hit the review queue. Average review time per flagged item was ninety seconds.

The partner told us the review queue actually made her more confident in the output than the old process. Under the manual workflow, she’d spot-check three or four clients a month. Under the agent workflow, she reviews every exception across every client, because the agent surfaces them. The routine work is invisible. The risky work is front and center.

If you want to see how this model would work in your close process, we built a worksheet that maps the typical steps and flags the decision points where a human checkpoint makes sense. You can download the Month-End AI Close Map for Accounting Firms and walk through it with your team. It takes about twenty minutes and gives you a clear picture of where the agent can run unsupervised and where you need a gate.

Why this matters for firms trying to scale advisory

The margin math on compliance work is getting worse. Clients expect faster turnaround, lower fees, and more transparency. Staff expect better hours and less repetitive work. The only way to hold margin is to automate the routine and redeploy people to higher-value work.

Advisory is that higher-value work. A compliance engagement might bill at one-fifty to two hundred an hour. An advisory engagement bills at three-fifty to five hundred. The problem is the calendar. If your senior people are buried in month-end close and tax prep, they don’t have time to prepare for advisory conversations, let alone deliver them.

An Advisory Insights Agent changes that. It reads each client’s monthly financials, compares them to budget and to prior year, and surfaces three things worth discussing. Maybe revenue is up but margin is flat, which suggests a pricing or cost problem. Maybe cash is down even though profit is up, which suggests a working capital issue. Maybe payroll as a percentage of revenue is creeping up, which suggests a staffing efficiency question.

The agent drafts talking points for the partner. Not a full analysis. Just enough to frame the conversation. The partner spends fifteen minutes reviewing the notes, adds their own context, and walks into the client meeting prepared. The meeting shifts from “here are your numbers” to “here’s what I’m seeing and here’s what I’d recommend.”

That shift is worth real money. A firm with fifty monthly clients and a thirty-percent advisory attach rate will bill an extra two hundred to three hundred thousand a year in advisory fees. The constraint isn’t demand. Most clients would pay for advisory if you offered it. The constraint is capacity. Your people don’t have time to prepare.

The agent creates that time by taking the compliance work off their plate. But only if the agent is designed to run in production, with review gates that let your people trust the output and focus on the exceptions.

What an Omni Audit tells you about your close process

We run a sixty-minute diagnostic called an Omni Audit. It’s not a sales pitch. It’s a working session. You bring your month-end close checklist, your client list, and your current cycle time. We walk through the workflow step by step and identify three things: where an agent can run unsupervised, where you need a review checkpoint, and what the time savings look like if you deploy it across your book.

You leave with three outputs. First, a process map that shows the current state and the agent-assisted state side by side. Second, a capacity model that estimates how many hours you’d reclaim per month. Third, a priority list that ranks your clients by complexity and flags the ones where an agent would have the biggest impact.

No deck. No follow-up meeting. You get the outputs in the session and you decide what to do next.

The firms that run the audit typically fall into one of two groups. Half of them have already tested an AI tool and want to know how to scale it. The other half haven’t tested anything yet but they’re losing people to burnout and they need a different model.

Both groups leave with the same clarity: you can’t scale AI in finance workflows without designing the human review into the system from the start. The agent handles the volume. The human handles the judgment. The workflow makes the handoff clean.

If that sounds like a conversation worth having, book a 60-min Omni Audit and we’ll map it out. Bring your close checklist and we’ll show you where the gates go.

The liability question isn’t going away

The reason most firms stall between pilot and production isn’t technical. The agent works. The reason is liability. If the agent posts a wrong entry and the client’s financial statements are misstated, who’s responsible?

The answer depends on how you designed the workflow. If the agent runs unsupervised and posts entries without review, you own the error. If the agent drafts entries and a human reviews and approves them, the human owns the decision. If the agent flags an exception and the human ignores it, the human owns that too.

The key is the audit trail. Every decision point needs a timestamp, a user ID, and a record of what was approved or rejected. That record has to be as detailed as the one you’d create if a senior accountant drafted the entry manually.

SAP’s findings confirm what we’ve seen in practice: the firms that move agents to production are the ones that treat the review checkpoint as a feature, not a bug. They design the workflow so the agent surfaces the exceptions, the human makes the call, and the system logs the decision. The agent scales the volume. The human scales the judgment. The liability model is clear.

If you want to see how that model would work in your firm, the Omni Audit for accounting and bookkeeping is the fastest way to map it. Sixty minutes, three outputs, no deck. You bring the close process, we’ll show you where the checkpoints go and what the capacity model looks like on the other side.

What to do this week

If you’re running a pilot on AI automation, ask yourself one question: if I deployed this across fifty clients tomorrow, would I trust the output without reviewing it?

If the answer is no, you need to design the review checkpoint before you scale. If the answer is yes, you’re either working with very clean data or you’re not thinking hard enough about edge cases.

The path to production isn’t more automation. It’s smarter escalation. The agent that wins is the one that knows when to hand off.

SAP confirmed it. The firms deploying agents in production confirmed it. The workflow that scales is the one where the agent does the work and the human makes the calls the agent can’t make.

If you want to map that workflow for your close process, book your Omni Audit and we’ll walk through it. Bring your checklist, we’ll show you where the gates go, and you’ll leave with a capacity model that shows what the other side looks like.

The firms that figure this out in the next six months will have a two-year lead on the ones that don’t. The technology is ready. The workflow design is the gap. Close it now, or watch your competitors do it first.

Enterprise DNA Resources