AI & Automization, HLT

"The new world of AI-generated... ...everything" – A NoteBook LM Podcast Experiment

Still stoked and shocked by the rapid development of generative AI, especially in the field of synthetic voices, we recently took NoteBook LM's podcast feature for a test drive. The results, quickly available as a compact audio file, were somewhat jaw-dropping. But only for a couple of minutes.

For those of you who haven't heard about one of the latest hypes in genAI yet: NotebookLM, developed by Google and first released in 2023 based on the company's Gemini/Bard system, is basically a research, document analysis, and note-taking service. It allows you to create summaries and explanations and will also answer specific questions based on your input.

In September 2024, Google introduced a new feature: the NotebookLM audio overview. This one promises to create summaries in the form of conversational podcasts, something the developers call "engaging Deep Dive discussions".

This is the feature we tested. On the content of this very blog. And here's the result:

When we first played this "podcast" at an internal meeting – for a couple of minutes, not until the end – most team members were impressed and had a big smile on their face. It all sounded so natural, positive, clever, and entertaining. And it had been so easy to create!

However, after a longer listening session with a more critical mindset, we started to notice substantial shortcomings.

Here's a quick list:

  • The podcast ignores several important articles and topics on our website; e.g. data-driven journalism is not mentioned once. Yes, the AI hosts announce that they want to focus on AI. Still it seems like a poor editorial choice considering our much broader "technology for journalism" statement and the fact that the blog has been around for 12 years

  • At some point, when the "deep dive" revolves around immersive tech, there's talk of "going fully VR" and people who might wear headsets all the time, when in reality this vision of immersive tech is rather dated–and certainly not pursued in any of our current projects.

  • While the AI hosts seem to "understand" some focus topics really well (e.g.: AI and automation, AI and language technology, AI and disinformation), they also jump from one article or dossier to the other in a way that doesn't make much sense. Why don't they discuss the sign language avatar project in one go? Why don't they have a dedicated section on deepfakes? Why can't they focus? Why do they repeat arguments and comments over and over again? (a: because they're stochastic parrots, and there's no human in the loop).

  • Some interruptions are highly random and repetitive ("right", "yeah", "for sure"); in some cases, they're inelegantly cut off ("righ!"), reinforcing what could be described as a slightly uncanny robot conversation vibe

  • Especially towards the end, the AI hosts tend to sound like pseudo experts, lapsing into (repeated) platitudes like "it's not a magic wand!", "it's not a silver bullet!", "that's the million dollar question!", or: "With great power comes great responsibility!"

There's one section where they talk about "the new world of AI-generated... ...everything", and it's almost as if they're accidentally referring to their own synthetic "deep dive", which is, at the end of the day, a wonder of engineering, a mixed bag, and quite a bit of smoke and mirrors.

To be fair, the podcast was produced in no time, with rigid standard settings. Come 2025 and a couple more add-ons like Wondercraft (that allow for manual edits and tweaks), we'll probably hear more sophisticated results. Or even a "deep dive" that is truly convincing.

Authors
team_alexander_plaum.jpg
Alexander Plaum
team_mirko_lorenz.png
Mirko Lorenz