Home Knowledge Base VQ

VQ

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

HyperVQ: Enabling Hyperprior Entropy Modeling for VQ-Based Generative Image Compression

Announce Type: replace Abstract: Vector Quantization (VQ) based generative image compression has achieved remarkable perceptual quality. However, existing VQ codecs suffer from two fundamental limitations. First, they lack efficient content-adaptive entropy modeling and rely on static frequencies, leading to low coding efficiency.

arXiv CS 8d ago

VQ-Atom: Semantic Discretization of Local Atomic Environments for Molecular Representation Learning

Announce Type: replace Abstract: Large language models succeed by combining large-scale pretraining with meaningful discrete tokens. In molecular machine learning, SMILES is widely used as a token representation, but it is primarily a linearization format for molecular graphs rather than a semantic decomposition of chemistry. We propose VQ-Atom, a semantic tokenization framework that assigns discrete atom-level tokens based on local chemical environments via vector quantization.

arXiv CS 1d ago

MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging

arXiv:2605.30904v1 Announce Type: new Abstract: Most visual tokenizers for image generation are bifurcated into two families with complementary limitations: continuous VAEs offer high-fidelity reconstruction but suffer from dense, entangled latents that are poorly suited for semantic control, whereas discrete VQ-based models enable autoregressive generation yet struggle with gradient sparsity, unstable training, and codebook collapse. In this work, we introduce MergeTok, a unified tokenizer...

arXiv CS 9d ago

Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning

arXiv:2508.06588v3 Announce Type: replace Abstract: Vector Quantization (VQ) has recently emerged as a promising approach for learning compressed and discrete representations for graph-structured data. However, a fundamental challenge, i.e., codebook collapse, remains underexplored in the graph domain, significantly limiting the expressiveness and generalization of graph tokens. In this paper, we present an empirical study and observe that codebook collapse consistently occurs when training...

arXiv CS 8d ago

Can Language Models Learn to Listen?

arXiv:2308.10897v2 Announce Type: replace Abstract: We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose treating the quantized atomic motion elements...

arXiv CS 5d ago

HiTokSR: A Coarse-to-Fine Tokenizer with Hierarchical Codebooks for High-Fidelity Real-World Image Super-Resolution

arXiv:2606.01157v1 Announce Type: new Abstract: Vector-quantized (VQ) generative models have shown promising results in real-world image super-resolution (Real-ISR). However, existing methods typically rely on a monolithic latent space that entangles low-frequency structures with high-frequency textures. This entanglement forces a single codebook to capture a combinatorially complex set of structure-texture pairings, which constrains representational capacity and limits codebook utilization.

arXiv CS 8d ago

LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection

arXiv:2606.04050v1 Announce Type: new Abstract: Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap" where Large Language Models cannot be optimally fitted to specific memory budgets. To bridge this gap, we introduce LiftQuant, a novel framework that enables continuous bit-width control for true Pareto-optimal deployment. The core innovation is a ``lift-then-project" mechanism which approximates...

arXiv CS 6d ago

C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

Announce Type: new Abstract: Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics.

arXiv CS 8d ago

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

Announce Type: replace Abstract: Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstruction, assigning tokens locally, so sequence consistency, compactness, length control, termination, and edit similarity are rarely optimized directly.

arXiv CS 1d ago

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

Announce Type: new Abstract: We present Echo, a proof-of-concept audio system built around a single 25 M-parameter ViT encoder. The encoder is pretrained with a JEPA objective and then specialised by stages to carry speaker identity, phonetic content, and dynamic source routing in the same 512-dimensional latent space, with no per-task fine-tuning at deployment. Light heads handle diarization (ArcFace + VBx) and dynamic source separation (null-target K-set prediction).

arXiv CS 8d ago