Home Knowledge Base LRS3

LRS3

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Assessing True Generalisability of Audio-Visual Speech Recognisers

Announce Type: cross Abstract: Current Audio-Visual Speech Recognition (AVSR) models achieve near-perfect performance on the standard LRS3 benchmark, raising concerns of adaptive overfitting. To systematically assess true generalisability, we construct a highly controlled, unseen evaluation set subsampled from the massive MultiVSR dataset. Unlike standard out-of-distribution benchmarks, our subset strictly matches the acoustic, visual, and demographic distributions of the LRS3 test set.

arXiv CS 2d ago

Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction

Announce Type: replace Abstract: Visual Automatic Speech Recognition (V-ASR) is a challenging task that involves interpreting spoken language solely from visual information, such as lip movements and facial expressions. This task is notably challenging due to the absence of auditory cues and the visual ambiguity of phonemes that exhibit similar visemes-distinct sounds that appear identical in lip motions. Existing methods often aim to predict words or characters directly from visual cues,...

arXiv CS 8d ago

M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition

arXiv:2606.05763v1 Announce Type: cross Abstract: Audio-Visual Speech Recognition (AVSR) enhances speech recognition robustness by leveraging visual cues, while real-world scenarios remain challenging due to viewpoint variation, audio distortion, and visual occlusion, which degrade modality quality and increase audio-visual asynchrony. In this paper, we propose a novel Modality-aware Multi-view Self-supervised representation framework for robust Audio-Visual Speech Recognition (M2S-AVSR)....

arXiv CS 5d ago

M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition

arXiv:2606.05763v2 Announce Type: replace-cross Abstract: Audio-Visual Speech Recognition (AVSR) enhances speech recognition robustness by leveraging visual cues, while real-world scenarios remain challenging due to viewpoint variation, audio distortion, and visual occlusion, which degrade modality quality and increase audio-visual asynchrony. In this paper, we propose a novel Modality-aware Multi-view Self-supervised representation framework for robust Audio-Visual Speech Recognition...

arXiv CS 2d ago