Home Knowledge Base Fine-Grained Visual Classification

Fine-Grained Visual Classification

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

ToolFG: Towards Well-Grounded Fine-Grained Image Classification

arXiv:2606.02518v1 Announce Type: new Abstract: Fine-grained image classification (FGIC) has broad applications and has attracted significant research attention. In this paper, we explore a novel paradigm for solving FGIC by proposing \textbf{ToolFG}, the first tool-integrated MLLM-based framework tailored to FGIC. ToolFG enables MLLMs to autonomously and flexibly use external tools during the reasoning process, actively interact with images, and collect verifiable visual cues for...

arXiv CS 8d ago

PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification

arXiv:2602.07768v3 Announce Type: replace Abstract: Distilling knowledge from large Vision-Language Models (VLMs) into lightweight networks is crucial yet challenging in Fine-Grained Visual Classification (FGVC), due to the reliance on fixed prompts and global alignment. To address this, we propose PAND (Prompt-Aware Neighborhood Distillation), a two-stage framework that decouples semantic calibration from structural transfer.

arXiv CS 7d ago

Hierarchical Mask-Enhanced Dual Reconstruction Network for Few-Shot Fine-Grained Image Classification

arXiv:2506.20263v2 Announce Type: replace Abstract: Few-shot fine-grained image classification (FS-FGIC) is challenging as it requires distinguishing visually similar subclasses with extremely limited labeled examples. Existing methods suffer from critical limitations: metric-based methods lose spatial information and misalign local features, while reconstruction-based methods underuse hierarchical feature information and lack selective focus on discriminative key regions. We propose the...

arXiv CS 5d ago

Dual Feature Decoupling for Fine-Grained OOD Detection

arXiv:2606.05536v1 Announce Type: new Abstract: Out-of-distribution detection (OOD) is an indispensable technique when applying machine learning models to real-world scenarios. Most existing OOD detection methods have been developed under the idealized assumption of large inter-class distributional differences, while largely overlooking fine-grained tasks characterized by subtle variations, such as medical image classification and vehicle recognition. The high visual similarity among...

arXiv CS 5d ago

MMBU: A Massive Multi-modal Biomedical Understanding Benchmark to Probe the Perception Capabilities of Vision-Language Models

arXiv:2606.06696v1 Announce Type: new Abstract: Vision and language models (VLMs) hold immense promise to transform biomedical imaging workflows, from detecting lesions in chest X-rays to profiling cellular features in microscopy. Realizing this potential, however, requires robust and fine-grained visual perception. Models need to correctly interpret subtle features in images, and they must do so across diverse biomedical modalities, scales, and contexts.

arXiv CS 2d ago

QPredSGG: Hybrid Quantum Predicate Learning for Long-Tailed Scene Graph Generation

arXiv:2606.04689v1 Announce Type: cross Abstract: Scene Graph Generation (SGG) requires relational reasoning over objects and their interactions, but performance is often limited by severe long-tail predicate imbalance. Classical SGG models frequently rely on dataset statistics, leading to biased predictions toward frequent relations rather than fine-grained semantic predicates. Although existing debiasing strategies improve mean recall, predicate classification in current frameworks still...

arXiv CS 6d ago

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

arXiv:2605.31597v2 Announce Type: replace Abstract: Measuring structured object understanding in vision foundation models remains challenging due to inconsistent evaluation protocols and limited part-level supervision. Semantic correspondence (SC) evaluates this capability by testing whether object parts can be matched across instances and categories under large variations in appearance, viewpoint, and geometry. To enable a systematic SC evaluation, we introduce SOCO, a new benchmark for...

arXiv CS 8d ago

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

arXiv:2605.31597v1 Announce Type: new Abstract: Measuring structured object understanding in vision foundation models remains challenging due to inconsistent evaluation protocols and limited part-level supervision. Semantic correspondence (SC) evaluates this capability by testing whether object parts can be matched across instances and categories under large variations in appearance, viewpoint, and geometry. To enable a systematic SC evaluation, we introduce SOCO, a new benchmark for...

arXiv CS 9d ago

3D Segment Anything Model with Visual Mamba for Diagnosing Placenta Accreta Spectrum

Announce Type: replace Abstract: Placenta Accreta Spectrum (PAS) is a rare but highly dangerous obstetric disease. Early and accurate PAS diagnosis is critical for maternal health. Traditional PAS diagnosis relies on experienced doctors by analyzing the cesarean history and Magnetic Resonance Imaging (MRI) data.

arXiv CS 7d ago

Low-Frequency Shortcuts in Texture-Driven Visual Learning

arXiv:2606.03493v1 Announce Type: new Abstract: Neural networks suffer from shortcut learning, where learned features generalize well to the training set but not to in-distribution (ID) or out-of-distribution (OOD) test sets. Existing studies are all based on a few standard benchmarks, which are shape-driven. Numerous application domains, however, are texture-driven.

arXiv CS 7d ago