Home Knowledge Base VQ-VAE

VQ-VAE

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Can Language Models Learn to Listen?

arXiv:2308.10897v2 Announce Type: replace Abstract: We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose treating the quantized atomic motion elements...

arXiv CS 5d ago

EEGDancer: Dynamic Emotion Latent Space Masked Modeling with Reinforcement Learning for EEG Continuous Emotion Prediction

arXiv:2606.05855v1 Announce Type: new Abstract: Continuous electroencephalography (EEG) emotion prediction aims to model the temporal evolution of human emotional states from EEG signals. Unlike conventional discrete emotion recognition, continuous prediction requires capturing long-range temporal dependencies and coherent emotional dynamics.

arXiv CS 5d ago

T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences

arXiv:2406.00636v2 Announce Type: replace Abstract: In this paper, we address the challenging problem of long-term 3D human motion generation. Specifically, we aim to generate a long sequence of smoothly connected actions from a stream of multiple sentences (i.e., paragraph). Previous long-term motion generating approaches were mostly based on recurrent methods, using previously generated motion chunks as input for the next step.

arXiv CS 2d ago

Planning-aligned Token Compression for Long-Context Autonomous Driving

arXiv:2606.07464v1 Announce Type: new Abstract: Monolithic vision-action models represent an emerging paradigm in autonomous driving. However, this architecture produces token sequences that quickly exceed real-time computational budgets when encoding extended temporal context for complex interactions. While approaches like linear transformers and external memory try to make the context lightweight, token compression is most compatible with the architecture as it requires no backbone...

arXiv CS 2d ago

C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

Announce Type: new Abstract: Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics.

arXiv CS 8d ago

Latent Anchor-Driven Test Generation for Deep Neural Networks

arXiv:2606.04310v1 Announce Type: new Abstract: Deep Neural Networks (DNNs) are increasingly being deployed in security-critical and safety-sensitive applications, which makes rigorous testing essential to identify and mitigate model weaknesses. Existing DNN testing approaches explore either the input space or a learned latent space.

arXiv CS 6d ago