GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Haoyang Li, Xuyi Zhuang, Azmat Adnan, Ye Ni, Wei Rao, Shreyas Gopal, Eng Siong Chng, Boon Siew Han, Yuanjin Zheng 1 min read

Key Points

Announce Type: replace-cross Abstract: Language Model (LM)-based generative modeling has emerged as a promising direction for TSE, offering potential for improved generalization and high-fidelity speech. We propose GenTSE, a two-stage decoder-only generative LM for TSE: Stage-1 predicts coarse semantic tokens, and Stage-2 generates fine acoustic tokens. Separating semantics and acoustics stabilizes decoding and yields more accurate target speech.

arXiv:2512.20978v2 Announce Type: replace-cross Abstract: Language Model (LM)-based generative modeling has emerged as a promising direction for TSE, offering potential for improved generalization and high-fidelity speech. We propose GenTSE, a two-stage decoder-only generative LM for TSE: Stage-1 predicts coarse semantic tokens, and Stage-2 generates fine acoustic tokens. Separating semantics and acoustics stabilizes decoding and yields more accurate target speech. Both stages use continuous SSL or codec embeddings, offering richer context than discretized-prompt methods. To reduce exposure bias, we employ a Frozen-LM Conditioning training strategy that conditions the LMs on predicted tokens from earlier checkpoints to reduce the gap between teacher-forcing training and autoregressive inference. We further apply DPO to better align outputs with perceptual preferences. Experiments on Libri2Mix show that GenTSE surpasses previous LM-based systems in speech quality, intelligibility, and speaker consistency.

GenTSE (ORG) TSE (ORG) Frozen-LM Conditioning (ORG)

Originally published by arXiv CS Read original →

GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model

Related Stories

Link between poverty and access to nature | Letter

The Last Evolution, by John W Campbell Jr. (1932)

Genetically modified worms can now produce and deliver drugs inside a living body, scientists say

Indonesia Landslides Devastated Endangered Orangutans, Study Finds