Home Knowledge Base PASCAL VOC

PASCAL VOC

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Lighting-Aware Representation Learning under Controllable Lighting Variation

arXiv:2606.06899v1 Announce Type: new Abstract: Variations in illumination remain a major challenge for visual representation learning, as they induce substantial appearance changes both across and within environments. While existing approaches typically address this issue through data augmentations that encourage models to become invariant to lighting changes, such strategies do not explicitly model lighting information during learning. Inspired by theories of human vision, we propose a...

arXiv CS 2d ago

CL-CLIP: CLIP-Based Continual Learning Framework with Cost-Volume Category Decoupling for Object Detection

arXiv:2606.06978v1 Announce Type: new Abstract: Continual Object Detection (COD) requires a detector to acquire new categories over time while preserving previously learned ones. This goal is closely related to open-vocabulary detection, since both settings require reasoning over categories that are not fully covered by the annotations available at the current training stage. Recent CLIP-based open-vocabulary detectors have shown strong zero-shot generalization, and frameworks such as F-ViT...

arXiv CS 2d ago

BareBones: Benchmarking Zero-Shot Geometric Comprehension in VLMs

arXiv:2604.10528v5 Announce Type: replace Abstract: While Vision-Language Models (VLMs) demonstrate remarkable zero-shot recognition capabilities across a diverse spectrum of multimodal tasks, it yet remains an open question whether these architectures genuinely comprehend geometric structure or merely exploit RGB textures and contextual priors as statistical shortcuts. Existing evaluations fail to isolate this mechanism, conflating semantic reasoning with texture mapping and relying on...

arXiv CS 5d ago

BareBones: Benchmarking Zero-Shot Geometric Comprehension in VLMs

Announce Type: replace Abstract: While Vision-Language Models (VLMs) demonstrate remarkable zero-shot recognition capabilities across a diverse spectrum of multimodal tasks, it yet remains an open question whether these architectures genuinely comprehend geometric structure or merely exploit RGB textures and contextual priors as statistical shortcuts. Existing evaluations fail to isolate this mechanism, conflating semantic reasoning with texture mapping and relying on imprecise annotations...

arXiv CS 8d ago