Home Knowledge Base Speech

Speech

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning

arXiv:2606.01905v1 Announce Type: cross Abstract: Objective: laryngectomees depend on an electromechanical device to generate electrolaryngeal (EL) speech. Compared with normal speech, EL speech suffers from severe distortion, limited phonetic variation, unnatural prosody, and temporal shifts, degrading naturalness and intelligibility. Although sequence-to-sequence (seq2seq) voice conversion (VC) based EL-speech-to-normal-speech conversion (EL2SP) is promising, substantial mismatches between...

arXiv CS 8d ago

Benchmarking Speech-to-Speech Translation Models

arXiv:2606.03241v1 Announce Type: new Abstract: Speech-to-speech translation (S2ST) has advanced rapidly, but offline evaluation lacks a unified protocol: studies report non-overlapping metric subsets, preventing direct comparisons. We introduce COMPASS, a unified and reproducible benchmarking framework integrating 46 metrics across eight dimensions, and deploy it on 1,248 model-language configurations from FLEURS and CVSS, spanning cascaded and end-to-end architectures over ten language...

arXiv CS 7d ago

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

Announce Type: cross Abstract: Speech translation systems increasingly span speech-to-text translation (S2TT), speech-to-speech translation (S2ST), offline translation, and streaming generation, producing outputs that differ in modality, speech realization, and timing behavior. Existing evaluation practices assess important aspects such as translation quality, speech quality, and temporal quality, but these aspects are often evaluated under separate protocols, making it difficult to compare...

arXiv CS 9d ago

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Announce Type: replace-cross Abstract: Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce SpeechEditBench, a bilingual multi-attribute benchmark for instruction-guided speech editing.

arXiv CS 6d ago

‘I’m not going to allow you to continue’: Mayor interrupts nonbinary teen’s Pride speech

‘I’m not going to allow you to continue’: Mayor interrupts nonbinary teen’s Pride speech The teenager had the opportunity to give their full speech later on Monday - Bookmark - CommentsGo to comments The mayor of Cambridge, Ontario is under fire after she intervened to stop a nonbinary teen's Pride speech. On Monday, Sophie Mills, 17, attended the city of Cambridge's Pride flag-raising event at city hall and was asked to speak during the ceremony, CBC News reports. Mills’ speech referenced a...

The Independent World 5d ago

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

arXiv:2606.01804v1 Announce Type: cross Abstract: Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce \textbf{SpeechEditBench}, a bilingual multi-attribute benchmark for...

arXiv CS 8d ago

Automatic Labelling of Speech Translation Errors

arXiv:2606.06047v1 Announce Type: new Abstract: Errors in speech translations reduce trustworthiness of Speech Translation (ST) systems and can have serious consequences. Yet currently there is no established methodology for evaluating confidence and quality estimation of speech translations. To initiate progress in this direction, we propose Speech Translation Error Labelling (STEL).

arXiv CS 5d ago

Universal Speech Content Factorization

arXiv:2603.08977v2 Announce Type: replace-cross Abstract: We propose Universal Speech Content Factorization (USCF), a simple and invertible linear method for extracting a low-rank speech representation in which speaker timbre is suppressed while phonetic content is preserved. USCF extends Speech Content Factorization, a closed-set voice conversion (VC) method, to an open-set setting by learning a universal speech-to-content mapping via least-squares optimization and deriving speaker-specific...

arXiv CS 1d ago

TLDR: Compressing Audio Tokens for Efficient Autoregressive Text-to-Speech

arXiv:2606.09019v1 Announce Type: new Abstract: Codec-based autoregressive (AR) speech language models have achieved strong text-to-speech (TTS) quality by modeling speech as sequences of discrete audio tokens with large pretrained backbones. However, this token-level formulation creates a structural efficiency bottleneck: speech-token sequences are much longer than text sequences, requiring the AR backbone to perform causal computation at every token position and maintain a KV cache that...

arXiv CS 1d ago

ImmersiveTTS: Environment-Aware Text-to-Speech with Multimodal Diffusion Transformer and Domain-Specific Representation Alignment

arXiv:2605.30965v1 Announce Type: cross Abstract: Recent advancements in text-guided audio generation have yielded promising results in diverse domains, including sound effects, speech, and music. However, jointly generating speech with environmental audio remains challenging due to the inherent disparities in their acoustic patterns and temporal dynamics. We propose ImmersiveTTS, an environment-aware text-to-speech (TTS) model that generates natural speech seamlessly integrated within...

arXiv CS 9d ago