What DeepSeek actually is

DeepSeek is a Chinese AI research and product company that has released both open-weight language models and a hosted inference API. The two flagship models behave differently and serve different purposes, which is the first thing worth understanding before you write any code.

DeepSeek-V3 is a Mixture of Experts model. It has a very large total parameter count, but only a fraction of those parameters activate on any given token. The practical effect is that you get frontier-tier output quality at a fraction of the inference cost of dense models. V3 handles general chat, code generation, summarization, translation, and structured extraction. Response latency is low, typically in the range you’d expect from any well-tuned production model.

DeepSeek-R1 is a reasoning model. It produces a visible chain of thought before its final answer, which makes it slower and more expensive per request but materially better at math, multi-step logic, planning, and code debugging. There are also distilled R1 variants that trade some reasoning quality for speed, useful when you want R1-style behavior on smaller infrastructure.

The API surface is intentionally familiar. The base URL is https://api.deepseek.com and the request shape matches OpenAI’s /v1/chat/completions endpoint exactly. If you have used OpenAI, Together, Groq, or any OpenAI-compatible provider before, the learning curve is essentially zero. The same SDK, the same JSON schema, the same streaming protocol.

This compatibility is not accidental. DeepSeek is targeting developers who already have tooling built around the OpenAI spec, and the company benefits from every existing integration working out of the box. The trade-off is that DeepSeek-specific features, like the separate reasoning content field, are layered on top of the standard schema rather than replacing it.

Setup and authentication

You need three things before you can make a successful call: an account, an API key, and a way to send HTTP requests.

Step one is creating an account at platform.deepseek.com. Registration requires an email address and standard verification. The interface is functional rather than polished, which is honest about what the product is. Once logged in, navigate to the API Keys section in the left sidebar and generate a new key. Give it a descriptive name so you can revoke it later if needed.

Treat this key like a database password. Store it in an environment variable, a secrets manager, or a .env file that is gitignored. Do not paste it into source code, do not commit it to a repo, and do not log it. If you leak it, revoke it immediately from the dashboard and generate a replacement.

Step two is funding the account. DeepSeek operates on prepaid credits rather than monthly invoicing. New accounts typically receive a small free allocation, enough for several hundred test calls. Beyond that, you top up through the billing dashboard using a card. Pricing is published per million tokens, separately for input and output, and the rates are among the lowest of any major provider. For most workloads, expect to pay a small fraction of what OpenAI or Anthropic charge for equivalent capability.

Step three is choosing your client. Any OpenAI-compatible SDK works. In Python, install the official openai package and point it at DeepSeek’s base URL. In Node, the same applies with the openai npm package. For raw HTTP testing, curl works fine and gives you the clearest view of what is actually being sent.

The environment setup looks like this:

export DEEPSEEK_API_KEY="sk-..."

Then install the client library:

pip install openai

That is the entire installation footprint. No custom SDK to learn, no proprietary protocol to reverse-engineer.

First working example

The smallest useful call is a single-turn chat completion. The following Python script sends a prompt to DeepSeek-V3 and prints the response:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {"role": "user", "content": "Explain what a Mixture of Experts model is in two sentences."}
    ],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)

The model identifier deepseek-chat routes to the current V3 build. For reasoning tasks, swap it for deepseek-reasoner which routes to R1. The SDK does not know which models exist on which provider, so it will not validate the model name for you. If you typo it, the API will return a 400 error with a useful message.

If you prefer curl, the equivalent request looks like:

curl https://api.deepseek.com/v1/chat/completions \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Both return standard JSON with a choices array. The response shape is identical to OpenAI’s, which means any parser, logger, or wrapper you have already written will work without modification. The usage field reports prompt tokens, completion tokens, and total tokens, which is what you need for cost tracking.

For streaming, set stream=True and iterate:

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Write a haiku about databases"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Streaming is delivered through server-sent events and works the same way as OpenAI’s. This matters for user-facing interfaces where perceived latency matters more than total throughput.

Key settings that matter

The default values work for casual use, but production workloads benefit from tuning.

Temperature controls randomness. Values near zero produce deterministic, focused output suitable for extraction, classification, and code generation. Values around 0.7 are the default for chat and feel natural. Values above 1.0 introduce creative variance but also hallucination risk. For any task where you parse the output programmatically, keep temperature low.

Max tokens caps the response length. Set this explicitly when you need predictable costs or when downstream parsers expect a fixed size. The default leaves room for very long answers, which is fine for chat but wasteful for classification tasks where the answer should be one word.

Top-p is an alternative sampling strategy that limits the cumulative probability mass of the token distribution. Most users should leave it at the default and adjust temperature instead. Mixing the two without understanding the math produces unpredictable results.

For reasoning models, there is a setting worth understanding. The R1 endpoint can return reasoning content separately from the final answer when you request it explicitly. This lets you inspect the chain of thought for debugging, evaluation, or display purposes. If you are building a tutoring application, for example, showing the reasoning alongside the answer is genuinely useful.

Frequency penalty and presence penalty are available but rarely matter. They nudge the model away from repeating itself, which is occasionally useful for long-form generation but irrelevant for short technical responses.

Stop sequences let you cut off generation at a specific string. This is useful when you want the model to produce structured output and signal completion with a marker like ###END###.

System prompts carry real weight. DeepSeek models respond well to explicit instructions about format, length, and tone. A vague system prompt produces vague output. A specific one with examples produces consistent output across many calls. Treat the system prompt as part of your code, not as flavor text.

Context window is large. The current version supports very long contexts, in the range you’d expect from a frontier model. This makes DeepSeek suitable for document analysis tasks that would exceed smaller context limits. You can feed in entire codebases, contracts, or research papers without aggressive chunking.

Where it shines

DeepSeek performs strongest in three categories.

Code generation and debugging is the first. The V3 model handles Python, JavaScript, TypeScript, Go, Rust, and SQL well, and the R1 reasoning model excels at diagnosing bugs in unfamiliar code. The pricing makes it economical to run large refactors, generate test suites, or batch-translate code between languages. For a developer working on a side project, the cost difference compared to frontier Western providers is large enough to matter.

Reasoning-heavy tasks are the second. Anything involving multi-step logic, math word problems, planning, or causal analysis benefits from R1’s chain of thought. The visible reasoning also helps when you need to audit why the model reached a particular conclusion, which is increasingly important for regulated industries.

Long document analysis is the third. With a large context window and low per-token pricing, you can feed in entire codebases, legal contracts, or research papers without aggressive chunking. This is a workload where cost compounds quickly on more expensive providers, and DeepSeek’s pricing makes it viable to run these workloads at scale.

For non-English languages, particularly Chinese, DeepSeek is competitive because the training data is balanced across languages rather than English-dominant. If your application serves a multilingual audience, this is worth testing against your current provider.

Where it fails

Honest limitations matter as much as strengths.

Tool use and function calling work but are less mature than the equivalents from OpenAI or Anthropic. The schema is compatible, but the model is less reliable at choosing the right tool from a large set, and multi-turn agentic flows with many tool definitions show more friction. If your application depends on complex agentic behavior, expect to do more prompt engineering.

Vision input is not supported through the standard chat completions endpoint. If you need image understanding, you need a different provider or a multimodal workaround. The open-weight releases are text-only.

Rate limits on free tier accounts are tight. Production workloads require a paid plan with higher throughput, and even paid plans have per-minute caps that you should design around. If you are building a real-time application, test your burst behavior early.

Latency for the R1 reasoning model is high because the model thinks before answering. A request that takes two seconds on V3 might take fifteen seconds on R1. Use R1 only when you actually need reasoning, and consider streaming the reasoning content to keep users engaged during the wait.

The hosted service occasionally experiences capacity issues during peak hours in Asian time zones. If you need guaranteed uptime, run the open-weight models on your own infrastructure instead. The trade-off is operational complexity and GPU cost.

Finally, the open-weight releases are not identical to the hosted API. The hosted models include additional fine-tuning and safety layers that the downloadable weights may lack. If you self-host for safety or compliance reasons, validate that the open-weight behavior matches your requirements.

Practical workflow pattern

The most productive pattern is to treat DeepSeek as a tiered provider rather than a single model.

Use V3 for high-volume, low-complexity tasks. Summarization, classification, extraction, simple code generation, and chat all belong here. The cost per token is low enough that you can afford to be generous with retries and few-shot examples.

Use R1 for tasks where correctness matters more than speed. Complex debugging, math, planning, and any prompt where you would otherwise write a long chain-of-thought instruction yourself. The visible reasoning also doubles as a debugging surface when the answer is wrong.

Build a thin abstraction layer over the API. A single function that takes a prompt, model choice, and parameters keeps the rest of your code clean. When you need to switch providers or compare models, you change one file rather than hunting through your codebase.

Cache aggressively. DeepSeek’s pricing makes caching less critical than with more expensive providers, but repeated calls with identical prompts still benefit from a simple in-memory or Redis cache. Semantic caching, where you match on embedding similarity rather than exact strings, is worth the complexity for high-traffic applications.

Log everything. Store the prompt, model, parameters, response, latency, and token counts. Without this data, you cannot optimize costs or debug quality issues later. A simple append-only log file is fine for development, and a proper observability stack is worth it for production.

Evaluate before you scale. Run a small benchmark set against your actual workload before committing to DeepSeek as a primary provider. The pricing advantage is real, but only if quality meets your bar. Twenty representative prompts with hand-scored outputs will tell you more than any leaderboard.

For local development, the open-weight models run on consumer hardware in their distilled forms. The full V3 and R1 weights require serious GPU infrastructure, but the smaller variants are accessible to anyone with a recent Mac or a single high-end GPU. Ollama and vLLM both support the model family.

For production, the hosted API is the default choice. Self-hosting makes sense only when you have specific data residency requirements, need to fine-tune on proprietary data, or have already saturated the rate limits of the hosted plan.

Start with one workflow, measure the cost and quality, then expand. DeepSeek rewards iterative adoption rather than big-bang migrations.

To see how tools like this fit into a complete AI operating layer for your business, book a 60-min Omni Audit — https://calendly.com/sam-mckay/discovery-call

Enterprise DNA Resources

DeepSeek API Tutorial: A Practical Setup Guide