Home › Knowledge Base › ASR

ASR

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

arXiv:2606.05846v1 Announce Type: new Abstract: Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR performance through synthetic CS speech generation or pair-specific fine-tuning on limited bilingual datasets.

arXiv CS 5d ago

Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems

arXiv:2606.05179v1 Announce Type: new Abstract: Punctuation restoration improves ASR (Automatic Speech Recognition) readability. However streaming ASR requires online decisions with limited future context. In streaming ASR, the system predicts punctuation incrementally, which makes generation-based approaches prone to latency and alignment failures under boundary-wise evaluation.

arXiv CS 5d ago

Few-Shot Synthetic Accented Speech for ASR Fine-Tuning: What Helps and When?

arXiv:2604.27273v2 Announce Type: replace Abstract: Synthetic accented speech is a promising way to improve automatic speech recognition (ASR) when real accented recordings are scarce. We ask what makes such data useful for ASR fine-tuning: target-accent phoneme edits that expose the recognizer to accent-specific pronunciations, or random phoneme perturbations that act as augmentation in phoneme space. In a few-shot TTS pipeline, we compare LLM-generated accent edits with matched-rate random...

arXiv CS 1d ago

Training-Free Intelligibility-Guided Observation Addition for Noisy ASR

Announce Type: replace-cross Abstract: Automatic speech recognition (ASR) degrades severely in noisy environments. Although speech enhancement (SE) front-ends effectively suppress background noise, they often introduce artifacts that harm recognition. Observation addition (OA) addressed this issue by fusing noisy and SE enhanced speech, improving recognition without modifying the parameters of the SE or ASR models.

arXiv CS 1d ago

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

arXiv:2606.02548v1 Announce Type: new Abstract: Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual settings where ASR models may emit romanized text.

arXiv CS 8d ago

MURMUR: An Efficient Inference System for Long-Form ASR

arXiv:2606.01483v1 Announce Type: new Abstract: Long-form automatic speech recognition (ASR) requires both high accuracy and low latency, but existing systems force a trade-off between the two. Chunk-based pipelines process audio in parallel windows for low latency, but lose cross-chunk context and need brittle heuristics to align speakers and timestamps at boundaries. Long-context ASR models resolve everything in a single pass for better accuracy, but are an order of magnitude slower.

arXiv CS 8d ago

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

arXiv:2510.04593v3 Announce Type: replace-cross Abstract: Large language models (LLMs) have demonstrated promising performance in both automatic speech recognition (ASR) and text-to-speech (TTS) systems, gradually becoming the mainstream approach. However, most current approaches address these tasks separately rather than through a unified framework.

arXiv CS 1d ago

WAXAL-NET: Finetuned Edge ASR Across 19 African Languages

arXiv:2606.02375v1 Announce Type: new Abstract: We evaluate whether compact domain-specialized ASR models can outperform massively multilingual foundation models for conversational African speech across 19 languages in the WAXAL corpus. Fine-tuned edge models achieve a macro-averaged WER of $38.0\%$ compared to $64.9\%$ for the best zero-shot baseline, a $26.9$ percentage-point reduction using models $3-40\times$ smaller. Results confirm that domain specialization dominates scale for...

arXiv CS 8d ago

Navigating the Reality Gap: On-Device Continual Adaptation of ASR for Clinical Telephony

arXiv:2512.16401v5 Announce Type: replace Abstract: Automatic Speech Recognition (ASR) can significantly reduce documentation burden in clinical workflows, but standard models degrade sharply in real-world telephony settings where noisy audio, dialectal variation, and strict data residency constraints prevent cloud-based adaptation. We study this "reality gap" using Gram Vaani: a telephonic Hindi corpus spanning rural healthcare and agricultural helplines, as the closest available proxy for...

arXiv CS 8d ago

BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language

arXiv:2606.03504v1 Announce Type: new Abstract: We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nastaliq script, derived from Mozilla Common Voice recordings. We fine-tune OpenAI Whisper-small on this corpus and report a Word Error Rate (WER) of 30.07% on a held-out validation set of 538...

arXiv CS 7d ago