GitHub's AI Agent Crisis: Why Microsoft Turned to AWS

There’s an odd new metric floating around enterprise tech circles: GitHub’s monthly commit rate. Not because anyone cares about version control at scale — but because what’s driving that number is a proxy for how fast AI agents are actually being deployed in the real world.

The answer, it turns out, is very fast. Fast enough to break things.

Microsoft confirmed last week that it is routing GitHub traffic through Amazon Web Services after AI coding agents overwhelmed the platform’s own infrastructure. GitHub, the world’s largest code hosting platform, logged nine separate incidents in May 2026 alone. June availability has tracked well below the 99.9 percent uptime threshold that enterprise customers pay for in their service agreements.

For a company whose cloud infrastructure business is Azure, the optics of using a direct competitor’s capacity are striking. But the numbers explain the decision quickly.

The Scale Is Staggering

GitHub is now processing roughly 275 million commits per week. That sounds abstract until you consider the growth curve behind it: AI agent pull requests grew from four million per month in September 2025 to 17 million per month by March 2026 — a 325 percent increase in six months.

The overall commit trajectory is even more dramatic. GitHub commits are on pace to reach 14 billion in 2026. In 2025, the total was approximately one billion. That is a 14x increase in a single year, almost entirely driven by automated agents writing, testing, and submitting code without human intervention.

April 2026 was the worst month on record for availability. Session start failure rates hit 84 percent at peak, briefly touching 97.5 percent. Agent wait times reached 54 minutes. Developers found themselves queuing to use a platform that was, until recently, reliably instant.

GitHub COO Kyle Daigle has said the company expects meaningful improvement by September 2026 — but improvement is not yet visible in the availability data.

Why Existing Infrastructure Couldn’t Keep Up

GitHub’s core platform was built on a Ruby on Rails monolith at a time when the primary workload was humans pushing code. That architecture has a well-understood vulnerability: tightly coupled services mean a single failure cascades across the whole system. An authentication database under load doesn’t just slow down authentication — it simultaneously breaks GitHub Actions, Copilot completions, and the web UI.

The explosive growth of AI agent workloads exposed this at exactly the wrong time. Agents make more API calls per interaction than humans do, operate continuously rather than during business hours, and generate orders of magnitude more automated triggers — tests, linters, deploys — per pull request than a developer would manually kick off.

This is not unique to GitHub. Multiple major AI and cloud companies are simultaneously struggling to meet AI compute demand using only their own infrastructure. Google will pay SpaceX $920 million per month from October 2026 for access to 110,000 NVIDIA GPUs. Goldman Sachs estimates $7.6 trillion in cumulative AI capital expenditure will be required from 2026 to 2031 just to keep pace with projected demand.

The GitHub situation is just the version of this problem that developers feel directly, in real time, when they are trying to ship software.

What This Means for Business

There are several things worth taking from this story if you are a business leader thinking about AI.

The adoption is real, not theoretical. When AI agents are generating 17 million pull requests per month and pushing GitHub past its capacity, this is not a hype cycle — it is production workload. Businesses that have been “exploring” AI deployment for the past 18 months are behind the curve. The leading adopters are already running agents at volumes that break enterprise infrastructure.

Reliability is now a first-order concern for AI deployments. Enterprises signing SLAs with software platforms need to be asking different questions than they did three years ago. What happens to availability when agent traffic grows 4x? What is the vendor’s infrastructure roadmap for agentic workloads? GitHub’s enterprise customers found out the hard way that 99.9 percent SLAs were written for human-scale usage patterns.

The infrastructure bill is going to be much larger than anyone planned. The fact that Microsoft needs AWS capacity, and that Google is paying SpaceX nearly a billion dollars a month for compute, tells you something important about how poorly the industry anticipated agentic AI’s infrastructure demands. Businesses building internal AI agent workflows should budget aggressively for compute and infrastructure — and build in contingency for demand that outpaces forecasts.

Cross-vendor collaboration is becoming normal, not embarrassing. A year ago, Microsoft using Amazon infrastructure for a flagship product would have been unthinkable. Today it is a pragmatic engineering decision driven by the scale of the problem. Businesses should take the same approach: the right infrastructure is the one that works, regardless of which logo is on the data center.

The GitHub story is ultimately a good problem to have. It means AI agents are actually being used, at scale, by real engineering teams. But it is a reminder that infrastructure is not a detail — it is the constraint that determines whether real-world AI deployment delivers on its promises.

For businesses building AI workflows today, the lesson is simple: plan for your AI usage to grow much faster than you expect, make reliability a contract-level requirement, and do not let the infrastructure questions be an afterthought.

Source

TechTimes

Free Resource

Going deeper with Claude?

Get the free 32-page implementation guide for ANZ teams.

Enterprise DNA Resources

GitHub's AI Agent Crisis: Why Microsoft Turned to AWS

The Scale Is Staggering

Why Existing Infrastructure Couldn’t Keep Up

What This Means for Business

Going deeper with Claude?

Your guide is ready