Home Knowledge Base \textit{static

\textit{static

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization

arXiv:2605.31508v1 Announce Type: new Abstract: Video Object-Centric Learning (OCL) aims to represent objects as \textit{slot} vectors and maintain their consistency across frames. Slot-Slot Contrastive (SSC) loss has become the cornerstone for state-of-the-art (SOTA) video OCL methods. While highly effective, SSC relies on one-to-one object correspondence across frames and introduces an extra loss.

arXiv CS 9d ago

Haptic Sorter: A Unified Planning Framework for Online Shape Estimation and Real-Time Pose Inference

arXiv:2605.31352v1 Announce Type: new Abstract: Robotics manipulation usually assumes that the shape and pose of the object are known to the robot prior to motion planning. However, precise geometric information is not always available in practice, and pose inference suffers from sensor uncertainties and view occlusion. In this work, we propose a unified model-based geometric framework integrating robotic haptic perception, modeling, and manipulation planning.

arXiv CS 9d ago

Enhancing the Socioeconomic Understanding of Foundation Models with Urban Mobility

Announce Type: new Abstract: Foundation models have recently been applied to urban socioeconomic prediction using POI text, satellite imagery, and geospatial descriptions. However, these models mostly rely on static attributes of individual places, while ignoring the mobility patterns that reveal how places are functionally connected. To address this gap, we explore whether mobility networks can elicit the geospatial capabilities of foundation models by explicitly encoding connectivity among...

arXiv CS 8d ago

Topology-Aware Layer Pruning for Large Vision-Language Models

arXiv:2604.16502v2 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in natural language understanding and reasoning, while recent extensions that incorporate visual inputs enable them to process multimodal information. Despite these advances, Large Vision-Language Models (LVLMs) incur substantial computational and memory costs, hindering deployment in resource-constrained scenarios. Existing layer pruning methods typically rely on local...

arXiv CS 5d ago

CP4D: Compositional Physics-aware 4D Scene Generation

arXiv:2606.09187v1 Announce Type: new Abstract: 4D generation (\textit{i.e.}, dynamic 3D generation) has recently emerged as a rapidly growing research frontier due to its powerful spatiotemporal modeling capabilities. However, despite notable advances, existing approaches typically fail to capture the underlying physical principles, producing results that are both physically inconsistent and visually implausible. To overcome this limitation, we present CP4D, a novel paradigm for...

arXiv CS 1d ago