Enterprise DNA

Omni by Enterprise DNA

Enterprise DNA Resources

Insights on data, AI & business. Practical AI operating-system thinking for owners, operators, and teams doing real work.

220k+

Data professionals

Omni

AI agents and apps

Audit

Map the manual work

Next.js for AI Apps: What Engineers Actually Found
Blog AI

Next.js for AI Apps: What Engineers Actually Found

A practitioner reaction to Next.js for AI apps, covering real latency, cost surprises, edge case failures, and what teams pair it with in production.

Sam McKay

The pitch for Next.js as an AI app foundation is straightforward. You get server components, server actions, edge runtime support, and a routing model that already feels built for streaming chat interfaces. On paper it is the obvious default for a team that wants to ship a GPT wrapper, a RAG tool, or an internal copilot without bolting together five different services.

Once you actually build one of these in production, the picture gets more textured. This article is a reaction to what the developer community has been reporting across Reddit threads, Hacker News discussions, and practitioner write-ups over the past year. The goal is to give you an honest read on where Next.js genuinely delivers, where it pushes back, and which teams end up happy versus frustrated.

What Practitioners Expected vs What They Got

The dominant expectation on r/LocalLLaMA and the r/nextjs community was that Next.js would handle the full stack cleanly. Drop an API route or server action, wire it to OpenAI or a self-hosted model, stream tokens back, render markdown, and call it a day. Many engineers coming from a Streamlit or Flask background saw Next.js as a way to ship a polished UI without learning a second framework.

What most people actually found is that the framework handles the “shell” of an AI app very well, but the actual AI plumbing requires a lot of glue. Streaming responses through the Vercel AI SDK works, but you still need to manage conversation state, tool calling, retries, timeouts, and rate limits. Several HN commenters described their first AI app as “60 percent Next.js, 40 percent custom orchestration code.”

A second surprise was deployment cost. Engineers who started on the free tier quickly learned that AI workloads, which keep connections open for streaming, burn through serverless execution time and bandwidth in ways a typical CRUD app never does. One thread on r/nextjs had a developer reporting a 4x increase in their Vercel bill after moving from a static docs site to a chat app with moderate traffic.

Where Next.js Genuinely Delivers

The streaming story is the strongest part of the experience. Server components, server actions, and the Vercel AI SDK compose well enough that you can get a token-by-token chat interface running in an afternoon. Practitioners consistently report first-token latency in the 300 to 800ms range when calling OpenAI directly, and the framework adds minimal overhead on top.

The App Router model is a real win for AI products. You can colocate your route, your server action that calls the model, and your React component that renders the streamed response. The mental model is simpler than the old pages router for this kind of work. A frontend engineer with no backend experience can ship a working AI endpoint in a few hours, and the community has produced enough starter templates that the early ramp is fast.

Edge runtime support is the other genuine strength. Running your AI route on the edge gives you lower cold start times, typically 50 to 150ms, and keeps latency consistent for global users. Teams building consumer-facing tools with users spread across regions have reported meaningful improvements versus deploying to a single region.

For RAG-style apps where you fetch context, embed it, and then call a model, the pattern is well documented. Pinecone, Supabase pgvector, and Turbopuffer all have clean Next.js integration guides. Practitioners building internal knowledge bases and document Q&A tools have generally had a smooth experience here.

Finally, the developer experience for the UI side is excellent. If your AI app needs rich rendering of model output, with code blocks, tables, citations, and tool result cards, React plus a library like react-markdown or streamdown gives you a lot of control. The HN comment section under most “AI app architecture” posts is full of engineers saying they would pick Next.js again purely for the rendering layer.

Where It Falls Short

The cost story is where most of the friction lives. Serverless functions on Vercel charge per invocation and per execution time, and a streaming AI request that keeps a connection alive for 15 to 30 seconds counts a lot more than a typical API call. Engineers in r/nextjs have reported monthly bills jumping from under $50 to several hundred dollars once their app crossed a few hundred daily active users. One thread had a small team paying $1,200 a month for what they described as a “trivial chat app.”

Cold starts on larger model calls are another pain point. If you are calling a model that takes 8 to 20 seconds to respond, serverless timeouts become a real concern. The default function timeout on Vercel Hobby is 10 seconds and on Pro is 60 seconds, which means longer agentic workflows simply will not work on serverless without workarounds. Several teams have moved their AI routes to dedicated Node servers or to Cloudflare Workers with longer limits.

State management is harder than it looks. AI apps are stateful in ways traditional web apps are not. You need to manage conversation history, tool call results, retries, partial failures, and user interrupts. The community has converged on patterns involving a messages array, a useChat hook, and persistent storage in Postgres or Redis, but there is no canonical Next.js way to do this. Practitioners consistently report that the framework gives you primitives but does not give you a recipe.

Edge runtime also has limits that bite. Several Node libraries used in AI pipelines, including some PDF parsers, certain vector database clients, and many LangChain utilities, do not run on the edge. The community has produced edge-compatible alternatives, but the experience of hitting a “this package is not edge compatible” error mid-build is common enough that it shows up in nearly every serious Next.js AI thread.

The other underrated problem is observability. When a model call fails or returns something weird, the default Next.js error handling does not give you much to work with. Teams end up building custom logging into every model call, capturing prompt, response, token count, latency, and user context. Without this, debugging production AI issues is nearly impossible, and the framework does not give you much out of the box.

What Practitioners Pair It With

The community has settled on a fairly consistent stack around Next.js for AI apps. For the model layer, OpenAI and Anthropic are the most common starting points, with a growing share of teams self-hosting Llama 3, Qwen, or DeepSeek models for cost or privacy reasons. For vector storage, Supabase with pgvector is the most common pick for small projects, with Pinecone and Weaviate showing up in larger deployments.

For orchestration, the split is between LangChain and a more minimal “just write the function” approach. The HN crowd has been increasingly skeptical of LangChain, with one widely upvoted comment describing it as “a framework that solves problems you would not have if you just called the API directly.” Many engineers report using the Vercel AI SDK for the streaming primitives and then writing their own tool-calling and agent logic on top.

For deployment, the split between Vercel and self-hosting has widened. Vercel remains the path of least resistance, and most production Next.js AI apps still run there. But teams that hit cost or timeout walls have moved to Cloudflare Pages, Railway, Fly.io, or a plain Node server on AWS. The pattern that recurs in threads is “start on Vercel, migrate off once the bill hurts.”

For observability, LangSmith, Helicone, and custom logging into Postgres or a dedicated analytics tool are the common picks. Several teams have built their own dashboards on top of Supabase just to track per-user token costs.

Who It Fits Best

Next.js is a strong fit for small teams, typically 1 to 5 engineers, building a product where the AI component is a feature, not the whole product. If you are a SaaS company adding a copilot or an AI-powered search box, the framework gives you everything you need and the integration cost is low. The same applies to internal tools and prototypes, where the time-to-first-demo advantage is hard to beat.

It is also a good fit for teams that already know React. The HN and Reddit signal is consistent here. Engineers who came to Next.js from a Vue or Svelte background were less enthusiastic, and several chose to use SvelteKit or Remix instead for the same kind of AI workload. The lock-in to the React mental model is real, and not everyone wants it.

It is a weaker fit for teams building heavy agentic systems, long-running workflows, or anything that needs to coordinate multiple model calls with persistence. The serverless timeouts and stateless nature of the typical deployment push back hard here. Teams in this category tend to end up on a dedicated Python backend with FastAPI, or on a self-hosted Node service, and use Next.js only for the UI layer.

It is also a weaker fit for teams that are not ready to build a fair amount of custom infrastructure. If you want a turnkey AI app builder, Next.js is the wrong level of abstraction. Tools like Flowise, Langflow, or even a managed service from a vendor will get you to a working product faster, at the cost of less control.

Common Replacements and Adjacent Choices

The most common alternative in practitioner discussions is SvelteKit. Engineers who have used both tend to describe SvelteKit as lighter, faster, and less ceremonious for AI work, though the ecosystem of AI-specific tooling is much smaller. Remix is the other frequent mention, particularly for teams that prefer its loader/action model over server components.

For the backend side, FastAPI on Python is the most common replacement when the AI workload gets serious. Teams describe moving the model orchestration logic to Python while keeping Next.js for the UI. The boundary tends to be “Next.js calls a FastAPI endpoint that does the model work and streams back.”

A few teams have gone further and replaced Next.js entirely with a custom React app on top of a Python backend, on the grounds that the AI logic is where most of the complexity lives, and they would rather not have a framework mediating every request.

The honest read from the community is that Next.js is a great UI framework and a decent enough backend framework for simple AI workloads, but it is not yet a complete answer for serious AI infrastructure. The shape that keeps recurring in production is “Next.js for the frontend, something else for the heavy lifting.”

A Practical Takeaway

If you are starting an AI app today and your team knows React, Next.js is a reasonable default. You will get a polished UI, decent streaming support, and a path to a working product in days rather than weeks. Budget for the cost surprises, plan for the timeout limits, and expect to write a fair amount of orchestration code yourself.

If your AI workload involves long agentic loops, heavy tool use, or strict cost controls, start by separating the AI backend from the UI, even if that means giving up some of the “all in Next.js” simplicity. The teams that have been happiest in the threads are the ones that treated Next.js as the view layer and built the rest deliberately.

If you’re working through which tools belong in your stack, book a 60-min Omni Audit — https://calendly.com/sam-mckay/discovery-call