KIT's Submission to Cross-Lingual Voice Cloning in IWSLT 2026

arXiv CS Monday 08 June 2026, 04:00 UTC By Seymanur Akti, Alexander Waibel 1 min read

Key Points

Announce Type: new Abstract: Cross-lingual voice cloning aims to generate speech in a target language while preserving speaker identity from a source-language reference. This task is central to speech translation and is the focus of the IWSLT 2026 Cross-Lingual Voice Cloning track. A key challenge is maintaining intelligibility and naturalness in the presence of accent variation and domain-specific vocabulary.

arXiv:2606.07240v1 Announce Type: new Abstract: Cross-lingual voice cloning aims to generate speech in a target language while preserving speaker identity from a source-language reference. This task is central to speech translation and is the focus of the IWSLT 2026 Cross-Lingual Voice Cloning track. A key challenge is maintaining intelligibility and naturalness in the presence of accent variation and domain-specific vocabulary. We build on a multilingual text-to-speech model, FishAudio-S2-Pro, and introduce language tag prompting to improve language control and reduce accent leakage. We further apply reinforcement learning (RL) fine-tuning for task adaptation and observe improvements in intelligibility. Finally, we propose a reference-conditioned lexical matching method that improves pronunciation of domain-specific terms when lexical overlap is present. Results show that language prompting provides the largest gains, while lexical matching yields consistent improvements on matched subsets.

KIT (ORG) Cross-Lingual Voice Cloning (ORG) IWSLT 2026 Cross-Lingual Voice (ORG) FishAudio-S2-Pro (ORG) RL (ORG)

Originally published by arXiv CS Read original →

KIT's Submission to Cross-Lingual Voice Cloning in IWSLT 2026

Related Stories

SpaceX Leaves Some Banks Peeved at Junior Roles in IPO Lineup

'Worrying' pollution in Cotswolds river - volunteers

Nasa chief defends choice of all-male Artemis III crew

The asteroid that wiped out the dinosaurs may have created a vast underground habitat for life that lasted 8 million years