Home Knowledge Base Emotional Expressivity Control

Emotional Expressivity Control

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech

arXiv:2606.05367v1 Announce Type: new Abstract: We investigate whether task-vector arithmetic, successful for cross-speaker emotional intensity control in modular text-to-speech (TTS), transfers to large-scale TTS systems built on language-model backbones with in-context learning (LM-TTS). Through a systematic elimination study over four progressively narrower operands on Qwen3-TTS-12Hz-1.7B - model weights via LoRA fine-tuning, continuous codec embeddings, discrete codec tokens, and the...

arXiv CS 5d ago

Sparse Autoencoders for Interpretable Emotion Control in Text-to-Speech

arXiv:2606.01479v1 Announce Type: new Abstract: Integrating large language models (LLMs) into text-to-speech (TTS) systems has improved speech expressiveness, yet interpretable emotional control remains challenging. Existing approaches primarily rely on external conditioning or global activation steering, offering limited insight into the internal representations underlying emotional control.

arXiv CS 8d ago

PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation

arXiv:2503.14295v3 Announce Type: replace Abstract: Recent advancements in audio-driven talking face generation have made great progress in lip synchronization. However, current methods often lack sufficient control over facial animation such as speaking style and emotional expression, resulting in uniform outputs. In this paper, we focus on improving two key factors: lip-audio alignment and emotion control, to enhance the diversity and user-friendliness of talking videos.

arXiv CS 5d ago

Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey

Announce Type: replace Abstract: Recent progress in multimodal large language models (MLLMs) is reshaping video translation from a cascaded pipeline of automatic speech recognition, machine translation, text-to-speech, and lip synchronization into a unified multimodal reasoning and generation problem. High-quality video translation requires not only semantic fidelity, but also temporal alignment, speaker consistency, and emotional expressiveness across visual, acoustic, and linguistic...

arXiv CS 8d ago

Resonant Minds: Closed-Loop Social Avatars with Theory of Mind

new Abstract: Creating lifelike digital humans with genuine social intelligence requires unifying cognitive reasoning and multimodal generation within a coherent framework. Current approaches treat these as separate tasks: Large Language Models excel at dialogue but lack embodied expression, while diffusion-based talking head models achieve visual fidelity but ignore social cognition.

arXiv CS 5d ago

Top 10 AI Tools That Will Transform Your Content Creation in 2025

Looking to level up your content creation game in 2025? You're in the right place! The digital landscape has evolved dramatically, and AI tools have become essential for creators who want to stay ahead of the curve.

TechCrunch 524d ago

LifeSide: Benchmarking Agents as Lifelong Digital Companions

Announce Type: new Abstract: Lifelong digital companions must integrate cross-session cues, continually update their understanding of users, and adapt to shifting privacy boundaries. Existing evaluations fail to capture this, testing memory recall and short-term empathy in isolation. To bridge this gap, we introduce \benchmark, a benchmark centered on multi-session \textit{Memory-Emotion-Environment} loops.

arXiv CS 6d ago

Empirical Modeling of Therapist-Client Dynamics in Psychotherapy Using LLM-Based Assessments

arXiv:2602.12450v2 Announce Type: replace Abstract: Psychotherapy is a primary treatment for many mental health conditions, yet the interplay among therapist behaviors, client responses, and the therapeutic relationship is difficult to study at scale, as process research has relied on labor-intensive human coding. We develop and validate a computational framework for modeling therapist-client interaction, using large language models (LLMs) to measure therapist behaviors (empathy,...

arXiv CS 1d ago

Cybernetic Android Avatar "Yui": System Integration, Field Deployment, and Evaluation

arXiv:2606.08099v1 Announce Type: new Abstract: Remote communication technologies have become widely used; however, supporting a sense of shared physical space and conveying rich non-verbal cues remain challenging in many social interaction scenarios. This study presents "Yui," a full-body cybernetic android avatar designed to integrate operator-side immersive teleoperation with interlocutor-side human-like social signaling. Yui combines a 55-degrees of freedom full-body mechanism with a...

arXiv CS 1d ago

Where did language come from? Nobody really knows, but the theories are fascinating

Where did language come from? Nobody really knows, but the theories are fascinating Lisa Lock Scientific Editor Andrew Zinin Lead Editor Humans are the only species known to use fully symbolic language: a system capable of expressing abstract ideas, imaginary worlds and endless combinations of meaning. But how did we get there?

Phys.org 1d ago