Home Knowledge Base Word Error Rate

Word Error Rate

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Beyond WER: A Paired Acoustic Stress Test for Ambient Clinical Scribes

new Abstract: Ambient clinical scribes increasingly combine Automatic Speech Recognition with Large Language Models to automate documentation. However, traditional metrics like Word Error Rate mask systemic safety degradation. We present a paired acoustic stress test to isolate the causal impact of noise on clinical reasoning.

arXiv CS 5d ago

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

arXiv:2606.02548v1 Announce Type: new Abstract: Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual settings where ASR models may emit romanized text.

arXiv CS 8d ago

Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource Languages

Announce Type: new Abstract: Multilingual ASR models such as Whisper perform well on high-resource languages but exhibit substantially higher Word Error Rates (WER) for Dravidian languages compared to Indo-Aryan ones. Through linguistic and dataset analysis, we show that Dravidian languages have longer words, higher vocabulary diversity, and lower repetition, resulting in sparse token distributions and frequent character-level substitution errors. Baseline fine-tuning further reveals decoder...

arXiv CS 1d ago

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

arXiv:2604.24278v3 Announce Type: replace Abstract: Automatic speech recognition systems often produce confident yet incorrect transcriptions under noisy or ambiguous conditions, which can be misleading for both users and downstream applications. Standard evaluation based on Word Error Rate focuses solely on accuracy and fails to capture transcription reliability. We introduce an abstention-aware transcription framework that enables ASR models to explicitly abstain from uncertain segments.

arXiv CS 5d ago

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

arXiv:2604.24278v4 Announce Type: replace Abstract: Automatic speech recognition systems often produce confident yet incorrect transcriptions under noisy or ambiguous conditions, which can be misleading for both users and downstream applications. Standard evaluation based on Word Error Rate focuses solely on accuracy and fails to capture transcription reliability. We introduce an abstention-aware transcription framework that enables ASR models to explicitly abstain from uncertain segments.

arXiv CS 1d ago

BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language

arXiv:2606.03504v1 Announce Type: new Abstract: We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nastaliq script, derived from Mozilla Common Voice recordings. We fine-tune OpenAI Whisper-small on this corpus and report a Word Error Rate (WER) of 30.07% on a held-out validation set of 538...

arXiv CS 7d ago

Evaluation of Automatic Speech Recognition Using Generative Large Language Models

arXiv:2604.21928v3 Announce Type: replace Abstract: Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their relevance through three approaches: (1) selecting the best hypothesis between two candidates, (2) computing semantic...

arXiv CS 9d ago

The Smallest Brain You Can Build: A Perceptron in Python

A perceptron is the smallest brain you can build. One yes-or-no answer comes out. That is the whole thing.

Hacker News 2d ago

Neural decoding of speech using deep neural ensembles

Speech brain-computer interfaces (BCIs) can restore rapid communication to people with paralysis, but decoding errors still limit performance. In recent brain-to-text decoding competitions, deep ensemble methods, which combine predictions from multiple independently trained decoders, have delivered striking accuracy improvements and account for the largest gains over baseline approaches. However, these methods have not previously been tested in real-time, require substantial computational...

bioRxiv 6d ago

Your Multimodal Speech Model Says I Have a Face for Radio

arXiv:2605.30472v1 Announce Type: new Abstract: As large neural models have become better at language tasks, researchers are increasingly building multi- and omnimodal models that handle more modalities of data. One example is the expansion of speech recognition models to audio-visual data for noise mitigation and multimodal subtitling. While performance and bias have been studied extensively in the single-modality regime, it is unknown how new modalities affect this, even though they...

arXiv CS 9d ago