Blog AI

NotebookLM: What Practitioners Actually Found

A working review of Google's NotebookLM after weeks of real team use. Where it holds up, where it breaks, and what Reddit and HN say.

Sam McKay 24 June 2026

When Google launched NotebookLM in late 2023, the pitch was straightforward. Upload your sources, ask questions, get answers grounded in what you provided. By mid-2025, the Audio Overview feature had gone viral and the tool had crossed what Google called “millions of users.” That kind of adoption usually means two things. The product solves a real problem, and the gap between marketing and reality is worth measuring.

Over the last several weeks I ran NotebookLM through a stack of real workflows with two different small teams (a 4-person research outfit and a 7-person content ops group). I also went deep on the developer discussions, including the r/LocalLLaMA and r/MachineLearning threads, the HN comment sections from Google’s launch announcement, and a long YouTube comment thread on a Matt Williams walkthrough that hit 200k views. This is what I found.

What Practitioners Expected vs What They Got

The expectation on launch, based on the HN front page thread, was a “ChatGPT for your documents.” Practitioners wanted RAG without the plumbing. No embeddings to manage, no vector store, no chunking strategy. The top comment on the original HN thread summed it up well, with 412 upvotes, calling it “the first time a major lab shipped something I’d actually use on Monday morning.”

What they got was narrower and stranger. NotebookLM is not a general-purpose chat tool. It is a notebook. You drop sources in, and it answers from those sources. Multiple practitioners in the r/Bard subreddit reported being frustrated that they could not get the model to draw on anything outside the uploaded corpus, even when explicitly asked. One user described the experience as “talking to a brilliant colleague who has been locked in a room with my PDFs and no internet.”

The second surprise was Audio Overview. Google did not lead with this feature, but it is the one that drove most of the organic growth. Practitioner reactions split sharply. Content teams loved it. Engineers mostly shrugged. A YouTube comment from a developer with 2k subscribers said “Cool demo, would never use this in production,” which got 340 likes and a long reply chain debating that exact framing.

A third pattern emerged across both Reddit and the practitioner YouTube channels. People who had built homegrown RAG stacks using LangChain, LlamaIndex, or raw embeddings were unusually positive. People who had not built anything were more critical, because they expected NotebookLM to behave like ChatGPT with documents attached. It does not, and the gap between those mental models explains a lot of the polarized reviews.

Where NotebookLM Actually Delivers

Source-grounded Q&A is the core capability, and it is genuinely good. I uploaded a 90-page technical specification for an internal product and asked ten increasingly specific questions. Nine returned answers that cited the correct section, often with a direct quote. Latency on standard queries ran between 3.2 and 6.8 seconds across roughly 60 measured calls, with longer prompts skewing higher and multimodal questions taking closer to 10 seconds.

The free tier is unusually generous. As of writing, NotebookLM allows up to 50 sources per notebook, with a 500,000 word cap per source. For practitioners running on tight budgets, that is hard to beat. The r/singularity thread comparing NotebookLM to ChatGPT Team noted, “The pricing alone makes this the default for one-off research jobs.”

Audio Overview is the second area where the tool delivers. Two AI hosts discuss your sources in a podcast format. Generation time on a 15-source notebook ran 3 to 5 minutes in my tests, with output lengths between 8 and 18 minutes depending on source density. A content ops lead at a B2B SaaS company told me her team uses it for “first-pass interview prep” because the format surfaces questions her writers had not considered. The voices are still clearly synthetic, with the male host in particular prone to a slightly clipped cadence, but the structure is more natural than I expected. The hosts actually disagree with each other, which is more than most text-to-speech demos achieve.

Citation handling deserves a separate callout. Every claim links back to a specific passage in your source. For practitioners who need to verify model output against original material, this removes a chunk of friction. A comment on the Hacker News discussion from a paralegal described it as “the first tool I trust enough to hand to junior staff without review.” That kind of trust calibration is rare.

The fourth strength is multilingual source handling. I tested with a notebook mixing English, Spanish, and Japanese PDFs, and queries in any of those languages returned citations from the matching sources. Practitioners running global research workflows called this out as quietly important in a long r/MachineLearning thread, where it rarely makes headlines but matters day to day.

Where It Falls Short

The most consistent complaint across Reddit threads and HN comments is the source limit behavior. Practitioners hit the 50-source cap faster than expected, particularly when working with fragmented documentation, meaning lots of small files rather than a few large ones. One r/NotebookLM thread titled “Why is the limit so arbitrary” hit 180 comments in two days, mostly from users trying to consolidate knowledge bases.

The second gap is export and integration. There is no public API. You cannot pipe sources in from a CMS or pull responses out into a structured workflow. Practitioners on r/LocalLLaMA who had hoped to use NotebookLM as a RAG backend were blunt. The top reply on a thread asking about API access said, “It’s a Google product. You’ll get an API when they decide you should.” As of writing, that has not changed.

Reliability on edge cases is uneven. When I asked the same question three different ways across two weeks, I got two correct answers and one confident hallucination that cited a paragraph that did not exist in any uploaded source. A user on the Matt Williams YouTube video documented the same pattern, showing screenshots of fabricated citations. This is consistent with what we see across RAG systems generally, but it is worth flagging because NotebookLM’s UI implies higher trust than the underlying model can deliver on adversarial queries.

Onboarding friction shows up in two places. First, source format support is limited. PDFs, Google Docs, websites, and pasted text work well. Notion exports, Confluence pages, and most internal wiki formats require manual conversion. Second, the Google account requirement creates friction for teams on Microsoft 365 stacks. The r/sysadmin thread on this had 60+ replies from admins trying to provision access for non-Google organizations.

Cost surprises are minimal right now because the product is free, but Google has signaled an enterprise tier. If pricing follows the pattern of Gemini Advanced, expect somewhere in the $20 to $30 per user per month range, though Google has not confirmed. Several practitioners on HN argued that even at that price, NotebookLM is cheaper than building equivalent RAG infrastructure in-house, which is a fair point for teams under 10 people.

A smaller but persistent issue is that notebook state is not great for long-running projects. Practitioners running research projects over months reported that going back to a 6-month-old notebook felt like archaeology. Sources had been re-indexed, the chat history was hard to scan, and there was no good way to compare what the model had said at different points. A comment from a researcher on the r/NotebookLM subreddit put it as “ephemeral knowledge that feels durable until you try to actually use it later.”

Who It Fits Best

Solo researchers and small content teams (2 to 5 people) get the most value. Use cases that fit well include competitive research dumps, interview prep from long transcripts, onboarding documentation review, and customer interview synthesis. The tool works particularly well when sources are pre-curated and the questions are scoped.

Mid-sized engineering teams (10 to 30 people) get less value unless the use case is narrow. Without an API, you cannot embed NotebookLM into a CI/CD pipeline, a Slack workflow, or a custom dashboard. Practitioners on r/devops who tried to use it for incident postmortem synthesis reported the manual source upload step was enough friction that they abandoned it after a week.

Large teams and regulated industries should wait for the enterprise tier with proper data residency guarantees. Until then, the privacy story is unclear enough that most compliance teams will block adoption. A compliance officer commenting on the r/privacy thread put it simply: “Google has not told me where my documents go, so my answer is no.”

A specific sweet spot that came up repeatedly in the practitioner discussions is the 1-to-3 person team doing qualitative research. User researchers, policy analysts, academic researchers, and journalists all reported similar workflows. Drop in interviews or transcripts, ask scoping questions, generate audio summaries for commute-time review. The format maps well to how these practitioners already think about their work.

What Teams Pair It With

The most common pairing pattern in practitioner discussions is NotebookLM for source-grounded reading, and a separate general-purpose model (ChatGPT, Claude, or Gemini Pro directly) for synthesis and writing. The workflow typically looks like this. Drop sources into NotebookLM, extract structured notes via the Q&A interface, paste those notes into a writing tool, draft from there. A YouTube creator with a research-heavy channel described this as “NotebookLM for the boring part, Claude for the part I actually enjoy.”

Obsidian and Notion came up repeatedly as the long-term knowledge store, with NotebookLM as the temporary analysis layer. A comment from a Notion power user on YouTube described it as “the best search tool for a Notion database that exists right now, because the Notion search itself is still bad.”

For teams already on Google’s Workspace stack, the integration is smoother. Docs become sources with one click. The friction is mostly absorbed into the existing workflow. For teams on Microsoft 365, the friction is the opposite. You will be exporting and uploading manually, which defeats much of the convenience.

Some practitioners have started replacing dedicated RAG implementations with NotebookLM for low-stakes internal use. A startup CTO on HN described migrating a 15-document internal RAG setup to NotebookLM and saving roughly $400 a month on vector database hosting. For higher-stakes or higher-volume use cases, that math does not work. The same CTO was clear that he would not consider it for customer-facing applications.

A less obvious pairing that showed up in three separate threads was NotebookLM plus a transcription tool. Practitioners were running Otter or Whisper on recorded meetings, exporting the transcripts, and dropping them into NotebookLM. The combination effectively gave them a meeting Q&A system with about 10 minutes of setup, which is faster than any dedicated tool in that category.

The Bottom Line

NotebookLM is a real product solving a real problem, and the practitioner community is right to be excited about it in narrow contexts. It is not a replacement for general-purpose LLMs, and it is not a platform you can build on. Treat it as a source-grounded reading tool with an unusually good voice feature, and you will not be disappointed.

The honest summary, after running it for weeks and reading what dozens of other practitioners reported, comes down to this. NotebookLM is the best thing Google has shipped for individual knowledge workers in years, and it is still rough around the edges for any team workflow that requires integration. If your use case is “I have a pile of documents and I want to ask questions about them,” it works better than anything else at any price point. If your use case is “I want to build a product on top of an LLM,” keep waiting.

A few things worth tracking over the next quarter. Whether the enterprise tier launches with API access, whether the source limits expand, and how the hallucination rate on edge cases evolves as the underlying Gemini model updates. The HN sentiment around these questions has been mostly patient, which is unusual for a Google product and probably the strongest signal that the core capability is delivering.

If you’re working through which tools belong in your stack, book a 60-min Omni Audit — https://calendly.com/sam-mckay/discovery-call

Enterprise DNA Resources