Emotional Expressivity Control
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech
arXiv:2606.05367v1 Announce Type: new Abstract: We investigate whether task-vector arithmetic, successful for cross-speaker emotional intensity control in modular text-to-speech (TTS), transfers to large-scale TTS systems built on language-model backbones with in-context learning (LM-TTS). Through a systematic elimination study over four progressively narrower operands on Qwen3-TTS-12Hz-1.7B - model weights via LoRA fine-tuning, continuous codec embeddings, discrete codec tokens, and the...
Sparse Autoencoders for Interpretable Emotion Control in Text-to-Speech
arXiv:2606.01479v1 Announce Type: new Abstract: Integrating large language models (LLMs) into text-to-speech (TTS) systems has improved speech expressiveness, yet interpretable emotional control remains challenging. Existing approaches primarily rely on external conditioning or global activation steering, offering limited insight into the internal representations underlying emotional control.
PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation
arXiv:2503.14295v3 Announce Type: replace Abstract: Recent advancements in audio-driven talking face generation have made great progress in lip synchronization. However, current methods often lack sufficient control over facial animation such as speaking style and emotional expression, resulting in uniform outputs. In this paper, we focus on improving two key factors: lip-audio alignment and emotion control, to enhance the diversity and user-friendliness of talking videos.
Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey
Announce Type: replace Abstract: Recent progress in multimodal large language models (MLLMs) is reshaping video translation from a cascaded pipeline of automatic speech recognition, machine translation, text-to-speech, and lip synchronization into a unified multimodal reasoning and generation problem. High-quality video translation requires not only semantic fidelity, but also temporal alignment, speaker consistency, and emotional expressiveness across visual, acoustic, and linguistic...
Resonant Minds: Closed-Loop Social Avatars with Theory of Mind
new Abstract: Creating lifelike digital humans with genuine social intelligence requires unifying cognitive reasoning and multimodal generation within a coherent framework. Current approaches treat these as separate tasks: Large Language Models excel at dialogue but lack embodied expression, while diffusion-based talking head models achieve visual fidelity but ignore social cognition.
Top 10 AI Tools That Will Transform Your Content Creation in 2025
Looking to level up your content creation game in 2025? You're in the right place! The digital landscape has evolved dramatically, and AI tools have become essential for creators who want to stay ahead of the curve.
LifeSide: Benchmarking Agents as Lifelong Digital Companions
Announce Type: new Abstract: Lifelong digital companions must integrate cross-session cues, continually update their understanding of users, and adapt to shifting privacy boundaries. Existing evaluations fail to capture this, testing memory recall and short-term empathy in isolation. To bridge this gap, we introduce \benchmark, a benchmark centered on multi-session \textit{Memory-Emotion-Environment} loops.
Empirical Modeling of Therapist-Client Dynamics in Psychotherapy Using LLM-Based Assessments
arXiv:2602.12450v2 Announce Type: replace Abstract: Psychotherapy is a primary treatment for many mental health conditions, yet the interplay among therapist behaviors, client responses, and the therapeutic relationship is difficult to study at scale, as process research has relied on labor-intensive human coding. We develop and validate a computational framework for modeling therapist-client interaction, using large language models (LLMs) to measure therapist behaviors (empathy,...
Cybernetic Android Avatar "Yui": System Integration, Field Deployment, and Evaluation
arXiv:2606.08099v1 Announce Type: new Abstract: Remote communication technologies have become widely used; however, supporting a sense of shared physical space and conveying rich non-verbal cues remain challenging in many social interaction scenarios. This study presents "Yui," a full-body cybernetic android avatar designed to integrate operator-side immersive teleoperation with interlocutor-side human-like social signaling. Yui combines a 55-degrees of freedom full-body mechanism with a...
Where did language come from? Nobody really knows, but the theories are fascinating
Where did language come from? Nobody really knows, but the theories are fascinating Lisa Lock Scientific Editor Andrew Zinin Lead Editor Humans are the only species known to use fully symbolic language: a system capable of expressing abstract ideas, imaginary worlds and endless combinations of meaning. But how did we get there?