Home Knowledge Base Adaptive Semantic Disentanglement

Adaptive Semantic Disentanglement

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

arXiv:2605.31266v1 Announce Type: new Abstract: The layout-to-image (L2I) task enables fine-grained control over image generation via object categories and spatial layouts. However, existing L2I methods yield fragmented and distorted generations under few-shot atypical settings. We term this failure as representation fragmentation, arising from a granularity mismatch that entangles semantic identity with visual details.

arXiv CS 9d ago

COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations

Announce Type: new Abstract: Composed Image Retrieval (CIR) represents a challenging retrieval task that targets locating specific images through multimodal inputs. Despite recent progress in CIR techniques, prior approaches often overlook cases where images appear visually alike yet differ in attributes, potentially undermining both multimodal feature fusion and similarity modeling. To mitigate this limitation, we design a unified representation of cross-modal features based on attribute...

arXiv CS 6d ago

GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement

arXiv:2605.30818v1 Announce Type: new Abstract: Non-contact material identification enables adaptive interaction for embodied intelligence yet faces challenges from geometry-induced variations (e.g., orientation, shape, distance) and single-modality ambiguities. In this paper, we present GaMi, a multimodal material identification system integrating mmWave and acoustic sensing to robustly operate under unconstrained geometric conditions. By leveraging the insight of shared geometric...

arXiv CS 9d ago

Disentangled Fine-Grained Prototype Learning for Incomplete Image-Tabular Classification

Announce Type: new Abstract: The missing-modality problem poses a significant challenge in image-tabular multimodal learning across a wide range of multimedia applications, including product understanding, recommendation systems, and medical diagnosis. This challenge is particularly pronounced when the two modalities are highly heterogeneous, as images and tabular attributes differ substantially in their semantic granularity and data distributions. Existing methods learn modality-invariant...

arXiv CS 5d ago

PHASE: Physiology-Aware Hyperspectral Reconstruction via Object-to-Human Domain Adaptation

arXiv:2511.13020v2 Announce Type: replace Abstract: Although hyperspectral imaging offers unparalleled non-invasive physiological insight, its bulky hardware, slow acquisition, and regulatory burden severely limit its clinical availability. A natural workaround is to reconstruct hyperspectral information from ubiquitous RGB or CASSI measurements. However, existing paradigms, developed for object-centric scenes, rely on reflectance-based feature alignment, assuming that spectral similarity...

arXiv CS 7d ago

HiTokSR: A Coarse-to-Fine Tokenizer with Hierarchical Codebooks for High-Fidelity Real-World Image Super-Resolution

arXiv:2606.01157v1 Announce Type: new Abstract: Vector-quantized (VQ) generative models have shown promising results in real-world image super-resolution (Real-ISR). However, existing methods typically rely on a monolithic latent space that entangles low-frequency structures with high-frequency textures. This entanglement forces a single codebook to capture a combinatorially complex set of structure-texture pairings, which constrains representational capacity and limits codebook utilization.

arXiv CS 8d ago

Adaptive Causal Alignment for High-Confidence Adversarial Training

new Abstract: Inverse adversarial training leverages high-confidence predictions to stabilize robust learning, yet we uncover a critical paradox: high confidence often stems from overfitting to non-causal background correlations rather than intrinsic object semantics. Our investigation reveals that visual context functions as a dual-natured signal, serving as either a necessary supportive prior or a spurious confounder. This insight renders existing blind suppression strategies flawed, as...

arXiv CS 7d ago

Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization

arXiv:2605.31508v1 Announce Type: new Abstract: Video Object-Centric Learning (OCL) aims to represent objects as \textit{slot} vectors and maintain their consistency across frames. Slot-Slot Contrastive (SSC) loss has become the cornerstone for state-of-the-art (SOTA) video OCL methods. While highly effective, SSC relies on one-to-one object correspondence across frames and introduces an extra loss.

arXiv CS 9d ago