Enterprise DNA

Omni by Enterprise DNA

Enterprise DNA Resources

Latest AI and industry news. Practical AI operating-system thinking for owners, operators, and teams doing real work.

220k+

Data professionals

Omni

AI agents and apps

Audit

Map the manual work

News Breaking AI News

OpenAI and Broadcom Unveil Jalapeño AI Inference Chip

OpenAI and Broadcom revealed Jalapeño on June 24 — a custom AI chip designed in 9 months targeting 50% lower inference cost than Nvidia GPUs.

Enterprise DNA | | via OpenAI
OpenAI and Broadcom Unveil Jalapeño AI Inference Chip

OpenAI and Broadcom unveiled “Jalapeño” today — OpenAI’s first custom AI inference chip, built from scratch over nine months. The announcement marks the clearest signal yet that the major AI labs are serious about owning their own silicon, not just renting it from Nvidia.

This matters to anyone running AI in their business. If OpenAI can run its models on cheaper, purpose-built chips, that cost pressure eventually flows downstream — to API pricing, and to the cost of deploying AI agents across your operations.

What Jalapeño Actually Is

Jalapeño is a custom ASIC (application-specific integrated circuit) designed specifically for large language model inference — the part of AI where the model actually runs and generates responses. That is different from training, which is where Nvidia’s H100s and B200s dominate.

OpenAI designed the chip around what it knows about its own models: how tokens flow, what memory access patterns look like, how parallelism works at scale. The result is a chip that can do one thing — run LLMs fast and efficiently — rather than a general-purpose GPU that can do many things adequately.

Broadcom handles the silicon implementation, manufacturing, and chip packaging. Celestica, a contract manufacturer, is handling board and rack system integration. The partnership splits the work: OpenAI contributes the architecture, Broadcom contributes the manufacturing expertise.

Engineering samples are already running production target workloads in the lab, including GPT-5.3-Codex-Spark. OpenAI says early testing shows “performance per watt substantially better than current state-of-the-art” — a direct shot at Nvidia’s power-hungry H-series GPUs.

Initial deployment is targeted for the end of 2026, with expansion planned in subsequent years.

Why This Is a Bigger Deal Than It Looks

The AI chip market has been effectively a one-company market. Nvidia’s H100s and B200s are the default choice for training and inference at scale. Nvidia has had pricing power to match — its chips have commanded premium prices throughout the current AI build-out, and the company’s gross margins have reflected that.

Other companies have tried to challenge Nvidia. Google has its TPUs. Amazon has Trainium for training and Inferentia for inference. Meta builds its own hardware. But none of these alternatives has meaningfully changed the narrative.

Jalapeño is different because it is OpenAI’s. The company running the most-used AI services in the world is now building the infrastructure those services will run on. That creates several downstream effects:

Cost pressure on Nvidia. Even if Jalapeño deployment stays internal to OpenAI, the existence of a credible alternative chips away at Nvidia’s negotiating leverage. Other cloud providers will cite Jalapeño when renegotiating their own chip contracts.

Reduced API costs over time. If OpenAI can serve its models at lower cost per token using its own chips, there is room to lower API pricing. That is good news for businesses building on top of OpenAI’s models.

Signal to the industry. Other frontier labs will accelerate their own silicon programs. Anthropic, Google DeepMind, and xAI have all been exploring custom chips. OpenAI shipping a working product validates the approach and raises the stakes.

Talent and supply chain dynamics. OpenAI built this chip in nine months, which is extraordinarily fast by semiconductor standards. That signals the company has significant hardware engineering talent and strong supply chain relationships — assets that take years to build and that competitors will note.

What This Means for Business

If you are using AI in your operations today, the direct impact of Jalapeño on your costs is zero right now. The chip ships at the end of 2026 at the earliest, and it will take time to scale to the volumes that shift OpenAI’s per-token economics meaningfully.

But the medium-term picture is more interesting. Here is what to watch:

AI inference gets cheaper. The historical pattern with custom silicon is that purpose-built chips deliver 2x to 5x better cost-efficiency than general-purpose GPUs for their specific workload. If Jalapeño hits the high end of that range, OpenAI’s inference costs drop meaningfully — and competitive pressure makes some of that pass through to customers.

Agent economics improve. The main constraint on deploying AI agents at scale is cost per query. As inference gets cheaper, running dozens of agents continuously becomes more viable for smaller businesses. The economics that today only work for large enterprises start to work for mid-market companies.

Vendor concentration risk decreases. Businesses relying on AI services should have noticed that Nvidia’s supply shortages and pricing power created real uncertainty over the past two years. A world with more chip diversity is a more stable one for anyone depending on AI infrastructure.

The cloud AI market gets more competitive. If OpenAI is building its own chips, other hyperscalers will push to differentiate on their AI infrastructure too. Google already has TPUs. Microsoft and Amazon are investing in their own silicon. The result is a more competitive market where buyers have more options.

The AI compute stack is growing up. Jalapeño is evidence that the frontier labs are thinking about long-term infrastructure control, not just building on top of commodity hardware. For businesses, that eventually means better economics — though the timeline is measured in years, not months.

What This Means for Business

The story here is not that OpenAI built a chip. The story is that OpenAI is building a full stack: models, training infrastructure, inference chips, and increasingly its own deployment pathways. Each layer it controls is a layer where it can optimize costs, improve performance, and reduce dependence on vendors who may have competing interests.

For business owners, the practical implication is this: the cost of running AI agents is going to keep falling. The economics that feel challenging today will look different in 2027 and 2028. If you are evaluating AI investment now, factor in that the per-unit cost of AI inference is on a structural downward curve — and announcements like Jalapeño are what that curve looks like in practice.

If you want to understand how falling inference costs change the business case for AI agents and voice AI employees in your specific context, that conversation is one worth having now — before your competitors have already worked it out.

Book a discovery call with the Enterprise DNA team to explore what AI workforce economics mean for your business.

Source

OpenAI