Home Knowledge Base DINOv3

DINOv3

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SemDINO: A DINOv3-Driven Network for Cross-Temporal Semantic Alignment in Change Detection

arXiv:2606.09772v1 Announce Type: new Abstract: Semantic change detection (SCD) aims to simultaneously locate land-cover changes and identify semantic categories before and after transition. However, existing methods suffer from insufficient cross-temporal alignment, weak multi-scale representation, and poor robustness to pseudo-changes caused by illumination, season, and registration noise. To address these issues, we propose a novel end-to-end semantic change detection network named...

arXiv CS 1d ago

DaX: Learning General Pathology Representations Across Scales

arXiv:2606.06983v1 Announce Type: cross Abstract: Computational pathology requires visual representations that transfer across diverse clinical endpoints and remain robust to variation in magnification, staining, scanner type, slide preparation, and input resolution. We present DaX, a pathology vision foundation model that adapts DINOv3-style self-supervised learning to whole-slide histopathology. DaX is initialized from natural-image DINOv3 weights and incorporates continuous magnification...

arXiv CS 2d ago

FAF-CD: Frequency-Aware Fusion for Change Detection under Imperfect Multimodal Remote Sensing

arXiv:2606.03114v1 Announce Type: new Abstract: Remote sensing change detection for real-world monitoring often relies on imperfect heterogeneous observations, where pre- and post-event images may be asynchronous, cross-sensor, or affected by illumination, seasonal, and modality shifts. This setting is especially challenging for EO-SAR disaster mapping, where nuisance variation can resemble structural damage. We propose FAF-CD, a frequency-aware hybrid framework with a DINOv3-pretrained...

arXiv CS 7d ago

ORACLE-CT: Anatomy-Aware Support Pooling for CT Classification

Announce Type: new Abstract: Abdominal CT disease classification is challenging because each scan is a large 3D volume with many possible findings, while diagnostic evidence is often confined to specific organs or anatomical compartments. Most study-level classifiers aggregate encoder features using anatomy-agnostic pooling or attention, creating a mismatch between localized disease evidence and global evidence aggregation. We propose ORACLE--CT, an encoder-agnostic anatomy-aware aggregation...

arXiv CS 5d ago

Detecting Temporally Localized Manipulations in Authentic Video Streams

arXiv:2606.07090v1 Announce Type: new Abstract: The rapid advancement of video editing and generative artificial intelligence technologies has made realistic video manipulation increasingly accessible. Although existing datasets have significantly advanced research in deepfake detection, object removal, and video inpainting, they do not adequately model scenarios in which a short manipulated segment is inserted into an otherwise authentic video and the original video continues afterward. In...

arXiv CS 2d ago

Spatially Grounded Concept Bottleneck Models via Part-Factorized Attention

arXiv:2606.04364v1 Announce Type: new Abstract: Concept bottleneck models (CBMs) predict a layer of human-named attributes before predicting a class, which makes their decisions auditable. On fine-grained recognition tasks the concept heads are usually free to attend anywhere in the image, so a head named for one body region can be satisfied by evidence on another.

arXiv CS 6d ago

Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

arXiv:2606.02321v1 Announce Type: new Abstract: Recent advances in large vision-language models have expanded video retrieval from simple text-based search to more flexible scenarios, where users may specify the desired result through both visual examples and textual instructions. In the CVPR 2026 Reason-Aware Composed Video Retrieval Challenge, the system is required to retrieve a target video according to a reference video and a modification instruction.

arXiv CS 8d ago

Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction

arXiv:2604.09877v2 Announce Type: replace Abstract: At the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes connects low-level geometric sensing with high-level semantic understanding. We present Genie 4D, a framework that turns hand-held phone capture into a semantically grounded, action-controllable 4D world model. Genie 4D couples a real-time visual-inertial Gaussian splatting front-end for metric geometry with a feed-forward 4D backbone...

arXiv CS 8d ago

GLINT: Sparsely Gated Vision-Language Alignment for Fine-Grained Radiology Representations

arXiv:2606.03180v1 Announce Type: new Abstract: Vision-language models (VLMs) for radiology have emerged as a scalable paradigm by leveraging image-report pairs naturally produced in clinical workflows. However, this pairing reveals a mismatch in scale: each finding occupies only a small region of the image, yet supervision is provided only at the global image-report level. This poses a central challenge: prior approaches spread weight densely across all patches rather than concentrating on...

arXiv CS 7d ago

HD-DinoMoE: A Class-Aware Hierarchical Dual Mixture-of-Experts Network for Scleral Anomaly Segmentation in Complex Acquisition Scenarios

Announce Type: new Abstract: Traditional Chinese Medicine (TCM) ocular inspection provides empirical cues for assessing scleral surface anomalies, but its clinical use remains subjective and difficult to quantify. To support intelligent and quantifiable ocular inspection, this study presents the TCM-inspired Artificial Intelligence Ocular Auxiliary Diagnosis System (TAO) and focuses on pixel-level scleral surface anomaly segmentation. For clinical and user-acquired images affected by...

arXiv CS 6d ago