Cisco: No AI Model Can Withstand Multi-Turn Attacks

A new Cisco research paper has landed a blunt finding on the desks of every enterprise IT leader buying AI tools based on published safety scores: the numbers you are using to make those decisions may be dramatically understating the actual risk.

The research, titled “Proprietary Problems,” tested 15 closed frontier AI models from five major vendors — OpenAI, Anthropic, Google, Amazon, and xAI — and compared their safety performance under two conditions: single-turn attacks (one prompt, one response) and multi-turn attacks (iterative, conversational pressure that escalates over multiple exchanges). The gap between the two is not marginal.

What the Numbers Actually Show

Published AI safety benchmarks almost universally use single-turn attack success rates (ASR) as the measure of a model’s resistance to misuse. A low ASR means the model refuses harmful requests. It sounds straightforward.

The problem is that attackers do not stop after one refusal.

Real-world adversaries — whether testing your customer-facing AI agent or probing an internal enterprise system — reframe questions, build context across multiple turns, adopt different personas, and gradually escalate. Cisco tested what happens when you do exactly that.

The results by model family:

OpenAI GPT-5.4: 2.74% single-turn attack success rate, rising to 24.68% under multi-turn pressure. That is roughly a ninefold increase.
Google Gemini 3 Pro: 18.10% single-turn, climbing to 73.35% multi-turn.
Anthropic Claude family: Among the strongest performers at 2.19%–3.64% single-turn, but still reaching 11.16%–16.20% under iterative pressure.
xAI Grok 4.1 Fast (non-reasoning mode): 88.30% multi-turn attack success rate, the highest of any model tested.

Across all 15 models, multi-turn attack success rates ranged from 7.89% to 88.30%. No model was immune. And critically, Cisco found that single-turn scores do not reliably predict a model’s multi-turn resilience — the ordering changes, sometimes significantly.

One finding offers a practical signal for enterprise architects: enabling reasoning mode on Grok 4.1 Fast dropped its multi-turn ASR from 88.30% to 43.5%. That is still not great, but it suggests reasoning modes may add meaningful resistance to conversational manipulation.

Why This Matters More Than It Should Have To

Enterprise procurement teams have spent the past two years building vendor scorecards that include AI safety ratings. Compliance teams reference published benchmark scores. Security reviews treat a low single-turn ASR as evidence of a model’s safety posture.

Cisco’s finding is that this methodology has a structural flaw. A model reporting a 2.74% attack success rate on a single-turn benchmark is not the same product as a model that surrenders on 24.68% of multi-turn attempts. Without paired data across both testing regimes, the two are indistinguishable on most public evaluations.

The researchers put it plainly: “For business decisions made on the basis of published single-turn scores, this presents security and governance risk.”

This is not a theoretical concern. If your business is deploying AI agents that handle customer interactions, internal knowledge bases, or operational workflows, those agents will be exposed to users who probe, test, and sometimes deliberately try to extract information or manipulate outputs. A benchmark score that only measures first-refusal behavior is telling you something, but not the thing you need to know.

What This Means for Business

For any business running AI in customer-facing or sensitive operational contexts, this research has three practical implications.

Demand multi-turn security testing. If a vendor provides you with safety benchmark scores, ask specifically whether those scores reflect single-turn or multi-turn testing. If they cannot answer, treat the number as a floor, not a ceiling.

Audit your agent configurations. The research found that different configurations of the same model can produce dramatically different attack success rates. Reasoning modes in particular appear to add resilience. If you are running AI agents in production, the configuration decisions your team made at deployment have direct security implications that warrant a review.

Governance cannot stop at model selection. The common enterprise approach is to choose a model with a good safety reputation and call the governance work done. Cisco’s research is evidence that ongoing monitoring, behavioral testing, and iterative red-teaming are operational requirements, not one-time checkboxes.

The research also reinforces a broader pattern: AI security is not a property of a model, it is a property of a system. The model is one component. The prompts, the memory, the tools it has access to, the user interface, and the monitoring around it are equally important. A highly capable model with weak guardrails and no behavioral monitoring is a liability, regardless of what the benchmark sheet says.

For businesses using EDNA Omni services to deploy AI agents, this kind of adversarial testing and governance architecture is built into how we approach deployments — not something clients need to figure out independently. If you are evaluating AI agent deployments and want to understand how your governance posture holds up to real-world attack patterns, a conversation with our advisory team is a practical starting point.

The Cisco research paper is available via the Cisco AI Blog. The full dataset compares 15 models across both testing regimes.

Source

Cisco AI Blog

Free Resource

Going deeper with Claude?

Get the free 32-page implementation guide for ANZ teams.

Enterprise DNA Resources

Cisco: No AI Model Can Withstand Multi-Turn Attacks

What the Numbers Actually Show

Why This Matters More Than It Should Have To

What This Means for Business

Going deeper with Claude?

Your guide is ready