Home › Knowledge Base › mel-spectrograms

mel-spectrograms

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

arXiv:2606.03455v1 Announce Type: cross Abstract: Recently, diffusion models operating on VAE latents or mel-spectrograms have become the dominant paradigm for zero-shot TTS. Although these compressed representations improve generation efficiency, they inevitably suffer from information loss and non-end-to-end training. Theoretically, directly modeling raw waveforms circumvents these issues; however, this direction remains underexplored and is often deemed difficult due to the extremely long...

arXiv CS 7d ago

MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion

arXiv:2606.09050v1 Announce Type: cross Abstract: Streaming zero-shot voice conversion (VC) has become increasingly popular due to its potential for real-time applications. The recently proposed MeanVC achieves lightweight streaming zero-shot VC, but it has several limitations: its chunk-wise autoregressive denoising doubles the effective training sequence length, conversion quality degrades under small-chunk settings, and its timbre encoder directly relies on reference mel-spectrograms,...

arXiv CS 1d ago

C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

Announce Type: new Abstract: Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics.

arXiv CS 8d ago

MyGardenBird: A Machine-Learning-Ready Bird Sound Dataset for Twelve Common Malaysian Birds

Announce Type: new Abstract: Bioacoustic datasets from tropical regions remain limited, in part due to the absence of reproducible workflows for aggregating recordings from public archives. We present \textbf{MyGardenBird}, a curated dataset of bird vocalisations representing twelve common species across Peninsular Malaysia and the Indo-Malayan region. Recordings were sourced from Xeno-canto and processed through species-level filtering, manual spectrogram segmentation, and quality control...

arXiv CS 2d ago

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

arXiv:2601.09239v5 Announce Type: replace Abstract: Speech tokenizers are a key building block of fully discrete Speech LLMs. Existing tokenizers either prioritize semantic encoding, fuse semantic content with acoustic style inseparably,or achieve incomplete semantic-acoustic disentanglement.

arXiv CS 8d ago