Home Knowledge Base VL

VL

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Announce Type: new Abstract: We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak...

arXiv CS 7d ago

TimeOmni-VL: Unified Models for Time Series Understanding and Generation

arXiv:2602.17149v2 Announce Type: replace Abstract: Recent time series modeling faces a sharp divide between numerical generation and semantic understanding, with research showing that generation models often rely on superficial pattern matching, while understanding-oriented models struggle with high-fidelity numerical output. Although unified multimodal models (UMMs) have bridged this gap in vision, their potential for time series remains untapped.

arXiv CS 7d ago

497 - Nationwide VL Separate Account-G (0001313581) (Filer)

Filed: 2026-06-01 AccNo: 0001193125-26-249825 Size: 344 KB

SEC EDGAR Filings 8d ago

497 - Nationwide VL Separate Account-G (0001313581) (Filer)

Filed: 2026-06-01 AccNo: 0001193125-26-249836 Size: 225 KB

SEC EDGAR Filings 8d ago

497 - Nationwide VL Separate Account-G (0001313581) (Filer)

Filed: 2026-06-01 AccNo: 0001193125-26-249827 Size: 343 KB

SEC EDGAR Filings 8d ago

Domain Adaptation with a Single Vision-Language Embedding

Announce Type: replace Abstract: Domain adaptation has been extensively investigated in computer vision but still requires access to target data at the training time, which might be difficult to obtain in real-world autonomous driving scenarios, especially under rare or adverse conditions. In this paper, we present a new framework for domain adaptation relying on a single Vision-Language (VL) latent embedding instead of full target data. First, leveraging a contrastive language-image...

arXiv CS 8d ago

Claude Code-Driving Scenario Mining for the Argoverse 2 Challenge

Announce Type: new Abstract: We present our submission to the CVPR 2026 Argoverse 2 Scenario Mining Challenge. Our system uses a four-stage pipeline: (1) autonomous code generation via a Claude Code agent powered by GLM~5.1, (2) iterative training set screening with Timestamp Balanced Accuracy threshold 0.8 to curate few-shot examples, (3) semantic code review by a separate Claude Code session, and (4) Qwen3-VL scene-level verification to filter false positives.

arXiv CS 1d ago

Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder

Announce Type: new Abstract: Aggressive distillation of the diffusion U-Net inverts the per-frame bottleneck of real-time text-to-image pipelines: once the denoiser is a 4-step or 1-step distilled student, the text encoder becomes the critical path. This inversion is most acute in vision-aware edit diffusion, where the encoder is a multimodal large language model (MLLM). We study the case of a 0.39B distilled edit U-Net paired with a 2.13B MLLM text encoder (Qwen3-VL) and present a streaming...

arXiv CS 5d ago

Failure-Aware Refinement of Vision-Language Model for Lithography Defect Detection

Announce Type: new Abstract: Semiconductor lithography inspection requires reliable detection of small pattern defects such as bridge, burr, pinch, and contamination. In this study, we propose a two-stage vision-language framework that combines initial defect detection with prediction refinement. In the first stage, Qwen3-VL is fine-tuned with LoRA as a vision-language adapter to predict defect counts, defect categories, and normalized bounding boxes from lithography images.

arXiv CS 1d ago

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

Announce Type: new Abstract: While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewpoints when only limited egocentric observations are available. In this work, we study this problem as thinking with imagination, where a VLM...

arXiv CS 5d ago