PADSE: A New Audio Forensics Approach to a Pressing Problem
,It's easier than ever to manipulate or synthetically generate speech these days – there's a wide range of tools out there, from voice conversion systems to advanced text-to-speech models. People have tried creating detectors that can spot all forms of manipulated audio, from shallow to deepfake. But covering such complex analysis for a wide diversity of content with a single tool is bound to weaken detection accuracy. That's why PADSE, our new audio forensics project, explores a different approach.
Speech is a highly powerful means of communication, as it conveys not only information but also emotions and social cues. As soon as we hear a familiar voice, we tend to recognize and trust it. At the same time, audio content is particularly difficult to analyze and verify because manipulations are often subtle and hard to detect. This makes spoken language especially well-suited for disinformation attacks.
The three-year research project PADSE aims to develop an innovative detection system for potentially manipulated and (partially) synthetic audio content that serves journalists, fact-checkers – and legal units. The novelty: placing individual speakers and their speech characteristics at the center of the detection system.
What does that mean? Well, there's a multi-layered approach. Using speaker profiles, consistency checks, manipulation and synthesis detection, as well as context and provenance analysis, PADSE enables speaker-specific verification. The speakers in focus are public figures, such as politicians and industry leaders – who's manipulated audio content would have severe societal impact.
The expected outcome is a significant improvement over generic approaches in the reliable and more rapid detection of even highly sophisticated speech‑based disinformation focusing on a defined set of high‑impact people.
But that's not all: One of the use cases in PADSE is to support media organizations in protecting their corporate identity by simplifying the detection of false information spread through the misuse of brand voices, such as those of news anchors. In doing so, PADSE contributes to strengthening information integrity in the digital media landscape.
For all acronym lovers
In case you were still wondering, PADSE is German for "Personenzentrierte, Audio-/sprachbasierte Deepfake- und Shallowfake Erkennung", which translates to "person-centered audio-/speech-based deepfake and shallowfake detection".
The project is funded by the German Federal Ministry of Research, Technology and Space (BMFTR) under the call „Trust in Democracy and State: Detect and Fend Digital Disinformation". Take a look here if you want to learn more about our other related projects.
Who is PADSE?
Together with our team, two research partners complete the consortium:
Fraunhofer IDMT's team will play a central role in PADSE, not only by coordinating the overall project but also by contributing their expertise in audio forensics. With extensive experience in detecting manipulated and synthetic audio content, they bring essential know-how to the consortium.
The third partner is the Webis Group of the Bauhaus-Universität Weimar. They are leading researchers in the fields of natural language processing and information retrieval, with a focus on authorship and style analysis, AI-generated text detection and text reuse.
Our role in the project
We take over key responsibilities in requirements analysis, use case development, user evaluation, data provision, user centered product development, and communication. Our goal is to contribute to the development of a trustworthy detection system for manipulated or synthetic speech, designed for seamless integration into journalistic workflows.
Your role in the project
If you want more information about the PADSE project, collaborate, or chip in with your own use case for synthetic speech detection, please get in touch with anna.schild@dw.com or ruben.bouwmeester@dw.com.
