What Perplexity AI actually is
Strip the marketing and Perplexity AI is a retrieval augmented chat product. It takes your question, runs a live web search, pulls in source pages, and feeds the results to a large language model alongside your prompt. The model then writes an answer that is anchored to those sources, with inline citations pointing back to the URLs it used.
There are two products under the same name and they behave differently. The consumer product at perplexity.ai is a web app focused on answer quality, source transparency, and a clean research experience. The developer product is the Sonar API, a hosted endpoint that returns cited answers with a controlled model backend.
The technical difference matters. The web app runs searches, re-ranks, and synthesis on Perplexity’s own pipeline and gives you a polished reading surface with focus modes, follow-up threads, and a Spaces feature for repeatable research templates. The API gives you a single call that returns a text completion with citations and optional structured search results, which is what you build into apps or automations.
The mental model that helps most: think of Perplexity as a research assistant that always opens a browser tab before answering. That single habit is what makes the answers current in a way a stock LLM cannot match.
Setup and authentication
Web app
No setup. Go to perplexity.ai, sign in with Google or email, and you are in. If you want the paid tier (currently called Pro), you pay a monthly subscription and unlock longer reasoning, more file uploads, and access to stronger underlying models. The free tier is enough to learn the product and is what most people should start on.
Sonar API
This is where it gets interesting for builders.
Step one, create an account at perplexity.ai and open Account, then API Keys. Generate a key, copy it, and store it somewhere safe. You will not see it again.
Step two, set it as an environment variable. On macOS or Linux, drop this into your shell profile:
export PERPLEXITY_API_KEY=your_key_here
Then load it in the current session:
source ~/.zshrc
Step three, make a first request. The endpoint is https://api.perplexity.ai/chat/completions and it is OpenAI compatible, which means the same request shape you would send to OpenAI works here. A minimal curl looks like this:
curl -X POST https://api.perplexity.ai/chat/completions -H “Authorization: Bearer $PERPLEXITY_API_KEY” -H “Content-Type: application/json” -d ’{“model”: “sonar”, “messages”: [{“role”: “user”, “content”: “Summarize the latest stable release notes for Python 3.13.”}]}’
You will get back a JSON response with a choices array containing the assistant text, plus a citations array of URLs the answer drew from. There is no SDK lock-in, which is one of the nicer parts of the product.
Choosing a model
At the time of writing, the API exposes a small family. sonar is the default and is fast. sonar-pro is slower, more expensive, and gives better answers on harder research tasks. There is also sonar-reasoning for problems that benefit from explicit chain-of-thought, and sonar-deep-research for multi-step investigations that you would otherwise outsource to a human analyst. Pick the cheapest model that solves your task and only move up when you have a measurable reason.
First working example
Let us walk through a concrete job. Imagine you are evaluating three vendors for a small piece of internal tooling and you need a structured comparison from public sources.
In the web app, open a new thread, type your question, and Perplexity will search, read, and answer with citations. To get a real comparison rather than a generic answer, write a prompt that constrains the output. Something like this works:
Compare Linear, Height, and Asana for a 10-person product team that needs issue tracking plus lightweight roadmapping. Return a table with columns for pricing, public API availability, SSO support, and known limitations. Cite each cell to a source.
You will get a table with footnotes, you can click any footnote to see the source page, and you can ask follow-up questions in the same thread without losing context. This is the basic loop of the product, and it is genuinely useful for the kind of work most teams procrastinate on.
In the API, the equivalent looks like this:
curl -X POST https://api.perplexity.ai/chat/completions -H “Authorization: Bearer $PERPLEXITY_API_KEY” -H “Content-Type: application/json” -d ’{“model”: “sonar”, “messages”: [{“role”: “user”, “content”: “Compare Linear, Height, and Asana for a 10-person product team…”}]}’
The response payload will include choices[0].message.content with the text, and citations as a list of URLs. Parse those out and you can feed the answer into a downstream system, a Notion page, a Slack message, or a small report generator.
The first run is usually a bit rough on prompt structure. Expect to spend 10 to 15 minutes refining the prompt until the output shape stabilizes. That is normal for any LLM-backed tool and is not specific to Perplexity.
Key settings that matter
The product has a number of dials, but most users ignore them and then complain about quality. Here are the ones that actually change outputs.
Focus modes. In the web app, you can narrow the search to Academic, Social, or YouTube. Academic restricts sources to peer-reviewed papers and preprint archives, which is what you want for anything scientific. Social pulls from Reddit, X, and similar, which is genuinely useful for sentiment questions but terrible for factual ones. YouTube is the only way to get video transcripts, which is occasionally the only place a piece of information lives.
Model selection. On Pro, you can switch between the default fast model, a Pro Search mode that runs deeper retrieval, and reasoning models that think longer. Use the cheap one for low-stakes questions and reserve the expensive ones for research you would otherwise pay an analyst to do.
Time filter. Perplexity will default to recent sources, but you can force it to a date range with a phrase like “from the last 7 days” or “from 2024 onwards.” This matters a lot for topics that move quickly, like pricing changes or breaking news. Without a time cue, the model can quietly mix in stale pages from older answers it has seen.
Search domain filter. You can lock the search to specific domains, which is the right move when you trust a small set of sources. For example, restricting to sec.gov and the company’s own investor relations site gives you much cleaner financial answers than an open web search.
System prompt. The API lets you pass a system message. Use it to pin the role, the output format, and the refusal behavior. A short system prompt that says “Answer only from cited sources and return JSON” will dramatically tighten the output shape compared to a bare user prompt.
Recency bias. On the API you can also pass a search_recency_filter set to day, week, month, or year. This is the single most useful setting for news-flavored queries and is the first thing to add when your output is too old.
Temperature. Default is low, which is what you want. Raising temperature on a retrieval product is almost always wrong, because the model will start improvising between sources instead of quoting them.
The general pattern: every setting exists to trade off cost, speed, and source quality. Defaults are reasonable but rarely optimal for a specific workflow. Take an hour, pick a real task you do weekly, and dial the settings until the output is good enough that you would actually use it.
Where it shines
There are a few jobs where Perplexity is genuinely the best tool available, and it is worth being specific.
Fast factual lookup with citations. Anything that used to require opening ten tabs and skimming each one. Vendor comparisons, regulatory summaries, “what changed in library X between versions,” “who are the main competitors to Y in region Z.” You get an answer plus a paper trail, which is what makes it usable in a work context where you might be challenged on the source.
Pre-meeting briefs. Five minutes before a customer call, ask Perplexity to summarize the prospect’s recent news, leadership changes, and product launches. The Pro Search mode is tuned for exactly this and will pull from a wider set of sources than a single Google search.
Lightweight market research. For early-stage scoping, where the alternative is hiring a research assistant, Perplexity is fast enough and cheap enough to replace a first pass. You will not get the depth of a proper report, but you will get 70 percent of the value at one percent of the cost, which is the right trade when you are still deciding whether to investigate a question at all.
News monitoring. Set up a recurring Space or a scheduled API call that watches a topic and returns new developments. This is the closest thing to an RSS feed for people who do not want to maintain one.
Code library research. “What is the current recommended way to do X in framework Y” is a question that ages badly. Perplexity handles it well because the answer is usually one blog post away, and the citations let you verify before you paste code into your project.
Competitive intelligence for small teams. A founder who cannot afford a CI tool can ask weekly questions and get a reasonable digest. The output will not be as good as a paid CI platform, but the price is right.
Where it fails
Honest list of what Perplexity is bad at, because every guide needs one.
Long structured documents. If you upload a 200-page contract and ask for a clause-by-clause summary, you are better off with a model that has the full document in context, like a Claude or GPT-4 class model on the file directly. Perplexity’s file handling is improving but is still primarily a search product, not a document reasoning product.
Anything behind a paywall. It cannot read most subscription-only sources, so any answer that depends on a WSJ article, a paid research report, or a private GitHub repo will be incomplete. The model will sometimes hallucinate the missing piece. Treat its answers as “what the open web says” and nothing more.
Numerically precise aggregation. “What is the total revenue of the top ten SaaS companies in Europe” is the kind of question where the retrieval step finds ten different numbers from ten different pages and the model averages them badly. For anything where the answer needs to be a sum, a mean, or a precise ranking, do the math yourself from the cited sources.
Highly specialized domains. Legal, medical, and tax questions get a confident surface that hides real gaps. The product will happily summarize a court ruling, but it will not tell you it skipped a related case that contradicts the answer. Use it to gather, not to conclude.
Latency-sensitive paths. The web app search round trip is measured in seconds, and the API is faster but not instant. If you need a sub-500ms reply inside a hot user flow, the Sonar API is too slow and you should embed a static model instead.
Anything the model has been trained to refuse. Standard refusal behavior applies. Do not expect it to answer questions a base LLM would refuse, regardless of the search step.
Practical workflow pattern
Here is the workflow that consistently pays off for technical users.
Treat Perplexity as a research front end, not as a final answer machine. The job it does well is compressing a wide reading pass into a short, cited summary. The job it does badly is replacing the reading pass entirely on topics that matter.
A good operating rhythm looks like this. Use the web app for ad hoc questions and exploration. When a question comes up more than twice, convert it into a Space with a saved prompt and a fixed output structure, so the answer is consistent each time. When a Space becomes part of a recurring deliverable, move it to the API and schedule it. The API output goes into a Notion page, a Slack channel, or a Notion database, and the citations are stored alongside the answer so you can audit it later.
A second pattern is to use Perplexity as the first step in a longer pipeline. Send a Perplexity answer to a stronger reasoning model with the prompt “Verify each citation and flag any claim that the cited source does not support.” That second pass is cheap and catches the most common failure mode, which is the model citing a real URL for a claim the URL does not actually make.
A third pattern is to keep a small personal log of prompts that worked. Perplexity is prompt-sensitive in the same way every LLM is, and a personal library of 20 to 30 good prompts is worth more than any new feature the product ships. Treat it like code you would reuse, not like a chat you would have once.
Cost control on the API is straightforward. Start with sonar, use sonar-pro only for tasks where you have measured an improvement, and set a hard monthly budget in your account settings. Most individual users will land in the low double digits per month for personal use, which is the range you would expect for an API at this tier.
One last thing worth saying out loud. The biggest mistake people make with Perplexity is using it the way they would use a normal chatbot, asking vague questions and accepting the first answer. The product rewards precise prompts, source restrictions, and a clear output format. Spend a session tuning those and the rest of the work pays off many times over.
To see how tools like this fit into a complete AI operating layer for your business, book a 60-min Omni Audit — https://calendly.com/sam-mckay/discovery-call