Seamless Interaction
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Scenarios
arXiv:2602.15519v3 Announce Type: replace-cross Abstract: Target speech extraction (TSE) typically relies on pre-recorded high-quality enrollment speech, which disrupts user experience and limits feasibility in spontaneous interaction. In this paper, we propose Enroll-on-Wakeup (EoW), a novel framework where the wake-word segment, captured naturally during human-machine interaction, is automatically utilized as the enrollment reference. This eliminates the need for pre-collected speech to...
DuplexOmni: Real-Time Listening, Seeing, Thinking, and Speaking for Full-Duplex Interaction
arXiv:2606.09186v1 Announce Type: new Abstract: Human interaction is continuous, multimodal, and full-duplex by nature. Although recent omni models have made substantial progress in unified speech, vision, and text modeling, combining seamless real-time interaction with complex reasoning and tool use remains challenging. We present DuplexOmni, a method for real-time multimodal full-duplex interaction.
Don't Pause: Streaming Video-Language Synchrony for Online Video Understanding
arXiv:2606.06991v1 Announce Type: new Abstract: Online Video Large Language Models (Video-LLMs) have advanced toward seamless human-AI interaction through frame-by-frame processing and proactive responding. However, a critical challenge remains in streaming scenarios: existing models typically pause video perception while generating responses, breaking real-time video-language synchrony and causing stutters. To address this, we introduce a novel paradigm for online video understanding:...
TALKPLAY: Multimodal Music Recommendation with Large Language Models
Announce Type: replace Abstract: We present TALKPLAY, a novel multimodal music recommendation system that reformulates recommendation as a token generation problem using large language models (LLMs). By leveraging the instruction-following and natural language generation capabilities of LLMs, our system effectively recommends music from diverse user queries while generating contextually relevant responses. While pretrained LLMs are primarily designed for text modality, TALKPLAY extends their...
The iPhone's Last Stand
Listen to this post: Apple fans would, for years and years, sneer at Microsoft’s penchant for talking about products that may or may not ship, deriding them as vaporware. After Apple’s bungled 2024 launch of Apple Intelligence and new Siri, however, vaporware is fair game, and just in time for this Article. Project Solara Last week, at its annual Build developer conference, Microsoft put forth a vision for a new ecosystem of hardware devices under the banner of Project Solara: The concept —...
DyaPlex: Full-Duplex Speech-Motion Model for Dyadic Interaction
Announce Type: new Abstract: We present DyaPlex, a streaming, full-duplex speech-and-motion model designed for dyadic interaction. To capture the continuous and reciprocal nature of human communication, this full-duplex capability empowers the agent to simultaneously perceive and generate both speech and physical motion in a streaming fashion. At its core, our method leverages the strong priors of a foundational full-duplex speech model and integrates a novel motion pathway, thereby...
A Unified Framework for Probabilistic Dynamic-, Trajectory- and Vision-based Virtual Fixtures
arXiv:2506.10239v3 Announce Type: replace Abstract: Probabilistic Virtual Fixtures (VFs) enable the adaptive selection of the most suitable haptic feedback for each phase of a task, based on learned or perceived uncertainty. While keeping the human in the loop remains essential, for instance, to ensure high precision, partial automation of certain task phases is critical for productivity. We present a unified framework for probabilistic VFs that seamlessly switches between manual fixtures,...
OrthoPhys: Physically Plausible Video Generation with Orthogonal-View Geometry Guidance
Announce Type: replace Abstract: Recent progress in video generation has led to substantial improvements in visual fidelity, yet ensuring physically consistent motion remains a fundamental challenge. Intuitively, this limitation can be attributed to the fact that real-world object motion unfolds in three-dimensional space, while video observations provide only partial, view-dependent projections of such dynamics. To address these issues, we propose OrthoPhys, a two-stage framework that...
Orange Lab: Lowering Barriers to Data Mining through Embedded Interactive Workflows
arXiv:2606.09239v1 Announce Type: new Abstract: While visual programming of data analysis workflows has become an important vehicle for the democratization of data science, such systems remain largely confined to standalone applications and offer limited support for transitioning their visual analytics solutions into interactive web environments. As a result, data analysis pipelines are difficult to share, embed, and adapt into user-facing analytical tools. We present Orange Lab, a web-based...
Locality-Aware Automatic Differentiation on the GPU for Mesh-Based Computations
arXiv:2509.00406v3 Announce Type: replace Abstract: We present a GPU-based system for automatic differentiation (AD) of functions defined on triangle meshes, designed to exploit the locality and sparsity in mesh-based computation. Our system evaluates derivatives using per-element forward-mode AD, confining all computation to registers and shared memory and assembling global gradients, sparse Jacobians, and sparse Hessians directly on the GPU. By avoiding global computation graphs,...