Enterprise DNA

Omni by Enterprise DNA

Enterprise DNA Resources

Step-by-step how-tos. Practical AI operating-system thinking for owners, operators, and teams doing real work.

220k+

Data professionals

Omni

AI agents and apps

Audit

Map the manual work

Guide Intermediate General

Build AI Agents in 2026: A Practical Tutorial

A hands-on walkthrough for building AI agents in 2026, covering setup, first working example, key settings, and real workflow patterns.

Sam McKay |
Build AI Agents in 2026: A Practical Tutorial

What AI Agents Actually Is

Strip away the marketing and an AI agent is a loop. You give a language model a goal, a set of tools, and a way to observe what happened when it tried something. The model picks an action, the action runs against real systems, the result comes back, and the model decides what to do next. That loop continues until the model decides the goal is met, or until you cut it off.

The pieces are not new. What changed is that language models got good enough at structured reasoning that you can hand them a function schema, let them choose which function to call, parse the result, and feed it back in. The agent is the orchestration layer around that loop. In practice that means a runtime that handles message history, tool definitions, retry logic, and termination conditions.

Three things separate a real agent from a chatbot with a plugin bolted on. First, the model is in control of the action sequence, not a fixed state machine. Second, the agent has access to tools that mutate state, not just retrieve it. Third, the loop has a budget, measured in steps, tokens, or wall-clock time, and the agent has to operate within it.

Most production agents in the current version of this space run on top of a few common primitives. A chat model endpoint with function calling. A vector store for retrieval. A tool registry where each tool is a typed function with a JSON schema. A memory layer that persists across turns. A planner that decides whether to act, ask, or stop.

You can build this yourself in a few hundred lines of Python, or you can use a framework. The frameworks worth knowing in 2026 are LangChain and LangGraph for graph-based orchestration, CrewAI for role-based multi-agent setups, AutoGen from Microsoft for conversational agent loops, and the OpenAI Assistants API for managed threads and tool execution. Each makes different tradeoffs around control, observability, and how much you write yourself.

Setup and Authentication

The fastest path to a working agent is the OpenAI Python SDK with function calling. It assumes you have an OpenAI account and an API key. Install the SDK and set your key as an environment variable.

Run pip install openai in your project environment. Then export OPENAI_API_KEY with your key value. If you are working in a notebook, set it in your shell profile so it persists across sessions.

For a more agent-shaped setup, install LangChain with pip install langchain langchain-openai langchain-community. This gives you the agent executor, tool abstractions, and a swappable model layer. LangGraph installs separately with pip install langgraph and is what you reach for when your agent needs branching logic or human-in-the-loop checkpoints.

Authentication is the same pattern across most providers. Anthropic uses ANTHROPIC_API_KEY. Google uses GOOGLE_API_KEY or a service account JSON file pointed to by GOOGLE_APPLICATION_CREDENTIALS. For local models through Ollama, no key is needed, but you do need the Ollama daemon running on port 11434.

A practical setup that holds up in production looks like this. Keep keys in a .env file loaded with python-dotenv. Never commit the file. Add .env to .gitignore on day one. For team work, use a secrets manager like AWS Secrets Manager, HashiCorp Vault, or Doppler, and inject keys at runtime rather than baking them into images.

If you are using the OpenAI Assistants API specifically, you also need to think about thread storage. Threads are server-side by default, which means conversation history lives on OpenAI infrastructure. For regulated workloads, run the agent loop yourself with the Chat Completions API and manage history in your own database.

First Working Example

Here is a minimal agent that can call two tools and reason about which to use. The example uses the OpenAI SDK directly so you can see every moving part.

Define two tools as Python functions. One fetches a weather forecast for a city. The other sends an email through a stub function. Each function gets a JSON schema describing its parameters. Pass those schemas to the model as tools when you create the chat completion.

The model returns either a normal assistant message or a tool_calls payload. When you see tool_calls, execute the matching function locally, append the result to the message history as a tool message, and call the model again. Repeat until the model returns a plain message with no tool calls.

A runnable version of this is around 80 lines of Python. The structure is a while loop with a step counter, a dispatch table that maps tool names to functions, and a message list that grows on each iteration. Set a max_steps guard at 10 so a misbehaving agent cannot loop forever.

To run it, send a prompt like “Check the weather in Berlin and email the summary to [email protected]”. The model will call the weather tool, read the result, decide it has what it needs, call the email tool, then return a final summary. You will see two tool calls in the trace and one final assistant message.

The same pattern works with LangChain but with less boilerplate. You define tools with the @tool decorator, pass them to create_openai_tools_agent, wrap that in AgentExecutor, and call .invoke on the executor with your input. The executor handles the loop, the dispatch, and the step limit for you.

Key Settings That Matter

The dials most people ignore are the ones that determine whether your agent is reliable or a liability.

Temperature is the first one. For tool-calling agents, set it to 0. You want deterministic choices about which tool to call. Higher temperatures make the agent more creative, which sounds good until it starts inventing tool names or hallucinating parameters.

The step limit matters more than people think. A typical agent task completes in 3 to 8 steps. If you see 15, something is wrong, either the tools are too granular, the prompt is unclear, or the model is stuck. Set max_steps to a number just above your expected range, around 10 to 12 for most workflows, and log when the agent hits it.

Token budgets are the second dial. Each step re-sends the full message history, so costs grow quadratically with step count. Cap the total tokens per run at something like 50,000 for a research task and 10,000 for a simple lookup. Use a model with a long context window for the planner, but route individual tool calls to cheaper models when possible.

The system prompt is the third dial and the one most worth your time. Be explicit about what the agent should and should not do. Specify the output format. List the tools by name and describe when to use each one. Tell the agent what to do when it does not know. A vague system prompt produces an agent that invents capabilities.

Parallel tool calls are supported by most current models and worth enabling. If the agent needs weather for three cities, it should call the weather tool three times in one turn, not across three turns. Set parallel_tool_calls to true on the API request.

Structured outputs are the fourth dial. When the agent returns a final answer, force it into a Pydantic schema. This eliminates a whole class of parsing bugs downstream. Use response_format with a json_schema object on OpenAI, or tool_choice set to a specific function name when you want the agent to always end with a structured call.

Finally, logging. Log every message in the loop, every tool call, every tool result, and the final outcome. Without this trace, debugging an agent that misbehaved once a week is impossible. LangSmith, LangChain’s tracing tool, handles this out of the box. If you build the loop yourself, write each step to a structured log file or push it to an observability platform.

Where It Shines

Agents are genuinely good at tasks where the steps are predictable in shape but variable in content. Pulling structured data from a list of URLs. Triaging incoming emails by intent. Summarizing a folder of documents against a rubric. Generating test cases from a spec. These are workflows where a human would do the same five things in slightly different orders each time.

The second strong use case is research and synthesis. Give the agent a search tool, a fetch tool, and a note-taking tool, and ask it to compile a brief on a topic. It will iterate, follow links, and produce a structured output that would take a person an hour. The agent does not get bored, does not skim, and does not forget what it found two pages ago.

The third is internal tooling. Wrapping a messy internal API behind an agent that knows when to call which endpoint is often faster than building a proper UI. The agent becomes the interface. This works best when the underlying API is well-documented and the user prompts are within a known distribution.

Customer support triage is a fourth. An agent that reads a ticket, looks up the customer, checks the knowledge base, and drafts a reply is