Home › Business & Finance › Hybrid Autoregressive-Diffusion Model for Real-Time Sign...

Business & Finance

Hybrid Autoregressive-Diffusion Model for Real-Time Sign Language Production

arXiv CS Wednesday 03 June 2026, 04:00 UTC By Maoxiao Ye, Xinfeng Ye, Mano Manoharan 1 min read

Key Points

Announce Type: replace Abstract: Earlier Sign Language Production (SLP) models typically relied on autoregressive decoding, which naturally preserves temporal causality but suffers from error accumulation at inference time. More recent diffusion-based approaches improve generation quality through iterative denoising, yet their sequence-level refinement process introduces substantial latency. To address this trade-off, we propose HybridSign, a hybrid autoregressive-diffusion model for...

arXiv:2507.09105v4 Announce Type: replace Abstract: Earlier Sign Language Production (SLP) models typically relied on autoregressive decoding, which naturally preserves temporal causality but suffers from error accumulation at inference time. More recent diffusion-based approaches improve generation quality through iterative denoising, yet their sequence-level refinement process introduces substantial latency. To address this trade-off, we propose HybridSign, a hybrid autoregressive-diffusion model for low-latency sign language production that combines causal frame generation with flow-based diffusion refinement. A Multi-Scale Pose Representation module captures fine-grained articulator features, while a Confidence-Aware Causal Attention mechanism leverages joint-level confidence scores to improve robustness under noisy 2D pose observations. Experiments on PHOENIX14T and How2Sign show that HybridSign consistently achieves the best quality--efficiency trade-off among the compared baselines. On the How2Sign test split, it reaches BLEU-1/4 scores of 30.12/6.48 and DTW of 3.89, while reducing time-to-first-frame to 5.90s and increasing throughput to 10.17 FPS under a 60-frame evaluation protocol.

HybridSign (ORG) DTW (ORG) FPS (ORG)

Originally published by arXiv CS Read original →

Hybrid Autoregressive-Diffusion Model for Real-Time Sign Language Production

Related Stories

Valve will stop producing physical Steam gift cards because of scammers

Oracle Reports Higher-Than-Expected Data Center Spending

USDA's Rollins called screwworm a 'little pest' amid U.S. spread. Last year, she called it 'terrifying'

Citi Says Investors Growing More Selective on Data Center Bonds