Navigating the Reality Gap: On-Device Continual Adaptation of ASR for Clinical Telephony

arXiv CS Tuesday 02 June 2026, 04:00 UTC By Darshil Chauhan, Adityasinh Solanki, Vansh Patel, Kanav Kapoor, Ritvik Jain, Aditya Bansal, Pratik Narang, Dhruv Kumar 1 min read

Key Points

arXiv:2512.16401v5 Announce Type: replace Abstract: Automatic Speech Recognition (ASR) can significantly reduce documentation burden in clinical workflows, but standard models degrade sharply in real-world telephony settings where noisy audio, dialectal variation, and strict data residency constraints prevent cloud-based adaptation. We study this "reality gap" using Gram Vaani: a telephonic Hindi corpus spanning rural healthcare and agricultural helplines, as the closest available proxy for clinical speech under strict on-device constraints. We show that a robust multilingual model (IndicWav2Vec) degrades from 11.59\% WER on standard clean Hindi to \textbf{41.71\% WER} on this proxy telephony data. We evaluate a progression of on-device adaptation regimes under realistic constraints, from full fine-tuning to parameter-efficient LoRA and stream-based continual learning, across multiple baselines, datasets, and seeds. Focusing on continual learning, our central finding highlights a critical interaction between Experience Replay (ER) and Elastic Weight Consolidation (EWC, parameterized by regularization strength $\lambda$). We show that standard positive EWC ($\lambda > 0$) can oppose replay-driven updates, limiting adaptation. Reversing EWC's strength ($\lambda < 0$) suggests that it can act as a directional control signal under ER-guided adaptation: negative $\lambda$ reinforces replay-driven plasticity, while a scheduled $\lambda$ enables phase-dependent control of stability and plasticity. Across evaluations on multiple datasets, we find that multi-domain replay provides a strong foundation for adaptation, while EWC modulates stability-plasticity dynamics without altering final performance. These results show that effective on-device adaptation depends on understanding how data-driven and parameter-level learning signals interact, rather than choosing methods in isolation.

the Reality Gap: On-Device Continual Adaptation of ASR (ORG) Clinical Telephony arXiv:2512.16401v5 Announce Type (ORG) ASR (ORG) Gram Vaani (PERSON) Hindi (LOCATION) standard clean Hindi (ORG) Elastic Weight Consolidation (ORG) EWC (ORG) ER (ORG)

Originally published by arXiv CS Read original →

Navigating the Reality Gap: On-Device Continual Adaptation of ASR for Clinical Telephony

Related Stories

Aldi shoppers face 62p car parking update as store announces new change across UK

Pollinators in peril: scientists reveal the hidden human health costs of the world’s disappearing bees

What's holding US pastors back from preaching on climate

Nothing says stupidity like Reform's obsession with destroying British jobs | George Monbiot