Deepfake Speech
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Mitigating Proxy-to-Wild Domain Gap in Deepfake Speech
arXiv:2606.07494v1 Announce Type: new Abstract: Recent neural audio codec-based speech generation (CodecFake) produces highly realistic audio, posing a challenge to existing deepfake countermeasure models. While using codec resynthesized speech (CoRS) as proxy data improves performance, it often suffers from limited generalization.
ExpSpeech-Net: Multimodal Fusion of Expression and Speech for Deepfake Detection
Announce Type: new Abstract: Deepfake videos are increasingly challenging the credibility of online content. Many existing detection methodology relies on complex, resource-intensive models, which limit their practical use. The study introduces the ExpSpeech-Net deepfake detection (SqN-R-DFD) model, which utilizes SqueezeNet and RNN (Recurrent Neural Network) as its backbone, providing a lightweight and efficient deepfake detection framework that simultaneously analyzes facial expressions...
AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors
arXiv:2509.21597v2 Announce Type: replace-cross Abstract: With the prevalence of artificial intelligence (AI)-generated content, such as audio deepfakes, a large body of recent work has focused on developing deepfake detection techniques. However, existing benchmarks employ a narrow set of datasets, leaving detector generalization to real-world conditions uncertain. In this paper, we systematically review 31 existing audio deepfake datasets and present an open-source benchmarking toolkit...
CodecFake+: Codec-Based Resynthesized Data as a Proxy for Detecting CodecFake Speech
Announce Type: replace Abstract: With the rapid advancement of neural audio codecs, codec-based speech generation (CoSG) systems have become highly powerful. Unfortunately, CoSG also enables the creation of highly realistic deepfake speech, making it easier to mimic an individual's voice and spread misinformation. We refer to this emerging deepfake speech generated by CoSG systems as CodecFake.
The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights
Announce Type: replace Abstract: Recent progress in audio generation has made it increasingly easy to create highly realistic environmental soundscapes, which can be misused to produce deceptive content, such as fake alarms, gunshots, and crowd sounds, raising concerns for public safety and trust. While deepfake detection for speech and singing voice has been extensively studied, environmental sound deepfake detection (ESDD) remains underexplored. To advance ESDD, the first edition of the...
FoeGlass: Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors
arXiv:2606.05101v1 Announce Type: new Abstract: Audio deepfake detection (ADD) models are critical for countering the malicious use of text-to-speech (TTS) models. Evaluating and strengthening ADD models requires developing datasets that span the space of generated audio and highlight high-error regions.
Escaping the Linearity Trap: Manifold Detours for Black-Box Adversarial Attacks on Singing Audio Deepfake Detection
arXiv:2605.30366v1 Announce Type: new Abstract: Recent Singing Voice Synthesis (SVS) advances enable highly realistic but potentially malicious AI covers, making singing voice deepfake detection (SVDD) crucial. Self-Supervised Learning (SSL)-based detectors achieve state-of-the-art performance by fine-tuning speech SSL backbones to capture singing-specific spoof artifacts.
AI voice scams can clone your family’s voice
It's your son's voice. He says he's been in a car accident. He's about to be arrested.
AI voice scams can clone your family’s voice
It's your son's voice. He says he's been in a car accident. He's about to be arrested.
BioLip: Language-Generalizable Lip-Sync Deepfake Detection via Biomechanical Constraint Violation Modeling
Announce Type: replace Abstract: Existing lip-sync deepfake detectors rely on pixel artifacts or audio-visual correspondence, and both fail under generator or language shift because the features they learn are tied to the training distribution. We take a different approach. Authentic lip motion is constrained by tissue mechanics and neuromuscular bandwidth; current generators typically do not impose these constraints, producing trajectories with elevated variance in velocity, acceleration,...