Microsoft's AI Agents Found 16 Windows Flaws on Their Own

Microsoft just published results from an internal AI security system that should make every business leader sit up and pay attention. Not because of the security angle specifically, but because of what it demonstrates about what AI agents can actually do when they work together at scale.

The system is called MDASH, short for Multi-Model Agentic Scanning Harness. Microsoft’s security team built it to autonomously find exploitable vulnerabilities in Windows code. This week, they reported that MDASH discovered 16 new vulnerabilities across Windows networking and authentication components, including four critical remote code execution flaws. All 16 were patched in Microsoft’s May Patch Tuesday release.

What MDASH Actually Does

The system orchestrates more than 100 specialized AI agents across a combination of frontier and distilled models. Instead of relying on a single large language model to do everything, MDASH breaks the problem into stages. Individual agents discover potential issues, others debate whether each finding is genuinely exploitable, and a final layer proves the bug end-to-end before it gets reported.

The performance numbers are hard to ignore. On a private test bed with planted vulnerabilities, MDASH caught all 21 out of 21 with zero false positives. Over five years of confirmed security cases in the Windows clfs.sys driver, it achieved 96% recall. In tcpip.sys, it hit 100%. On the public CyberGym benchmark, which covers 1,507 real-world vulnerabilities, MDASH scored 88.45%, the highest score on the public leaderboard.

These are not demo numbers. Microsoft is using MDASH in production.

Why This Matters Beyond Security

Most business leaders reading this are not running Microsoft-scale security operations. So why does this story matter for you?

Because MDASH is a blueprint for what enterprise AI agents look like when they are actually working. A few things stand out.

First, the multi-agent architecture is the key insight. No single AI model is good enough to do the whole job reliably. The power comes from specialized agents with different roles working in sequence, each checking the others’ work. This is exactly the model that is proving out across agentic deployments in finance, customer service, and operations.

Second, the false positive rate is what makes it trustworthy. Systems that generate noise are ignored. MDASH was built with precision as a requirement, not an afterthought. That is the standard any enterprise AI agent needs to meet before it touches production workflows.

Third, this is knowledge work. Security research has always required expert human judgment. The fact that 100+ coordinated AI agents can now perform it reliably at scale is a signal about what category of work is next. It is not just repetitive tasks. It is diagnostic, analytical work that previously required deep expertise.

What This Means for Business

If your instinct is still that AI agents are useful for low-stakes tasks like drafting emails or summarizing documents, stories like MDASH should shift that frame. The technology is moving into territory where autonomous AI systems are finding things that human experts might take months to discover.

For business owners and operations leaders, the practical takeaway is this: the coordination model matters as much as the models themselves. Deploying one general-purpose AI tool and asking it to do everything is not the same as designing a system where specialized agents work in sequence with defined roles and quality checks.

That is the architecture that produces results like MDASH’s. And it is the same architecture underpinning serious enterprise AI deployments across industries right now.

Microsoft plans to open MDASH to enterprise customers through a private preview in June. Whether or not security automation is relevant to your business, the underlying design is worth understanding. It is one of the clearer public examples of what capable, trustworthy agentic AI looks like in practice.

If you are thinking through how agentic AI could work inside your business, not as a single chatbot but as coordinated AI agents handling real workflows, book a discovery call with the Enterprise DNA team. We help businesses design and deploy AI agent systems that actually move the needle.

Source

Microsoft Security Blog

Enterprise DNA Resources

Microsoft's AI Agents Found 16 Windows Flaws on Their Own

What MDASH Actually Does

Why This Matters Beyond Security

What This Means for Business