Home › Knowledge Base › Perceptually Guided

Perceptually Guided

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

Announce Type: new Abstract: Neural audio codecs are a key component of speech processing pipelines, compressing audio into discrete tokens for downstream modeling. However, existing codecs struggle to balance reconstruction quality with token efficiency, often encoding perceptually irrelevant information such as background noise and recording artifacts at the expense of linguistically and acoustically meaningful content. We reframe audio tokenization as a selective information bottleneck...

arXiv CS 6d ago

BareWave: Waveform-Native Flow-Matching Text-to-Speech

arXiv:2606.09048v1 Announce Type: cross Abstract: Removing intermediate representations and separately trained decoding stages has become an important direction in generative modeling. In text-to-speech, however, high-quality systems are still commonly built through an intermediate acoustic representation before waveform synthesis. In this work, we present BareWave, a fully waveform-native framework for direct text-to-wave generation in flow-matching TTS.

arXiv CS 1d ago

P$^2$-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization

arXiv:2606.03376v2 Announce Type: replace Abstract: Hallucination has recently garnered significant research attention in Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) aims to learn directly from the corrected preferences provided by humans, thereby addressing the hallucination issue. Despite its success, this paradigm has yet to specifically target the perceptual bottleneck in attended regions or address insufficient Visual Robustness against image degradation.

arXiv CS 6d ago

P\textsuperscript{2}-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization

arXiv:2606.03376v1 Announce Type: new Abstract: Hallucination has recently garnered significant research attention in Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) aims to learn directly from the corrected preferences provided by humans, thereby addressing the hallucination issue. Despite its success, this paradigm has yet to specifically target the perceptual bottleneck in attended regions or address insufficient Visual Robustness against image degradation.

arXiv CS 7d ago

HiRQA: Hierarchical Ranking and Quality Alignment for Opinion-Unaware Image Quality Assessment

arXiv:2508.15130v3 Announce Type: replace Abstract: Despite significant progress in no-reference image quality assessment (NR-IQA), dataset biases and reliance on subjective labels continue to hinder their generalization performance. We propose HiRQA (Hierarchical Ranking and Quality Alignment), a self-supervised, opinion-unaware framework that offers a hierarchical, quality-aware embedding through a combination of ranking and contrastive learning. Unlike prior approaches that depend on...

arXiv CS 7d ago

PRISM: Rethinking Atmospheric Scattering Reconstruction as a Unified Understanding and Restoration Model for Real-world Dehazing

arXiv:2604.07048v2 Announce Type: replace Abstract: Real-world image dehazing (RID) aims to remove haze-induced degradation from real scenes. This task remains challenging due to non-uniform haze distribution, spatially varying color shifts, and the scarcity of paired real hazy-clean data. In PRISM, we propose Proximal Scattering Atmosphere Reconstruction (PSAR), a physically structured framework that jointly reconstructs the clear scene and scattering variables under the atmospheric...

arXiv CS 7d ago

INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception

arXiv:2606.04437v1 Announce Type: new Abstract: Collaborative perception extends the perceptual range of autonomous vehicles by sharing information across agents, but heterogeneous sensors and perception models make intermediate feature fusion difficult to deploy at scale. Existing heterogeneous collaboration methods typically follow a translation-first paradigm: collaborator features must be aligned, adapted, or projected into an ego-compatible space before fusion. Such...

arXiv CS 6d ago

TetraFuse: A Synergistic Four-Dimensional Dynamic Fusion Framework for Efficient and Robust Medical Image Classification

Accurate and robust classification of medical pathology images is pivotal for computer-aided diagnosis. However, the deployment of deep learning models in high-throughput clinical screening faces a fundamental challenge: the trade-off between diagnostic accuracy and computational efficiency. Current lightweight architectures, while reducing parameter complexity through grouped convolutions, often lead to cross-channel information isolation and diminished representational capacity.

bioRxiv 4d ago

IEA: Amateur-Friendly Conversational Image Editing Agent via Three Stages of Multitask Alignment

new Abstract: Current image editing software often hinges on fixed filters or expert tuning, leaving a gap between amateur users' intent and outcomes. Creations by generative models may contain artifacts, implausible details, or stylistic drift away from photorealism and offer little insight into why an edit was made. We propose IEA, a conversational Image Editing Agent that learns to operate parameterized tools in an explicit, interpretable action space.

arXiv CS 1d ago

Relational Aesthesis in Permacomputing Practice: Building a Solar Powered Website from Reclaimed Materials

Announce Type: new Abstract: Permacomputing is a nascent concept and community of practice concerned with developing alternative computing systems grounded in principles of resilience, reuse, sufficiency, and ecological limits. However, research engaging with permacomputing remains in an early stage of development, raising concerns about whether permacomputing can move beyond reflective critique to become a meaningful alternative practice.

arXiv CS 9d ago