Home Knowledge Base Optical Character Recognition

Optical Character Recognition

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Rethinking Genomic Modeling Through Optical Character Recognition

arXiv:2602.02014v2 Announce Type: replace Abstract: Recent genomic foundation models largely adopt large language model architectures that treat DNA as a one-dimensional token sequence. However, exhaustive sequential reading is structurally misaligned with sparse and discontinuous genomic semantics, leading to wasted computation on low-information background and preventing understanding-driven compression for long contexts. Here, we present OpticalDNA, a vision-based framework that reframes...

arXiv CS 2d ago

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

arXiv:2606.01393v1 Announce Type: new Abstract: Document parsing and recognition are fundamental capabilities for vision-language models (VLMs) and document processing systems. However, existing Optical Character Recognition (OCR) and document parsing benchmarks are increasingly limited in coverage and difficulty: many focus on common document genres or uniformly sampled pages where modern parsers already perform strongly, while offering limited annotation for expert-domain structures such...

arXiv CS 8d ago

Interfaze: The Future of AI is built on Task-Specific Small Models

arXiv:2602.04101v2 Announce Type: replace Abstract: We present Interfaze, a native hybrid model that fuses task-specific deep neural networks (CNNs and DNNs) directly into a transformer decoder through a shared embedding space. Specialized perceptual encoders handle optical character recognition (OCR) over complex multilingual PDFs, open-vocabulary object and graphical user interface (GUI) detection, and multilingual speech recognition with diarization. Each is exposed through a...

arXiv CS 6d ago

Real-Time Automatic License Plate Recognition Using YOLOv8, SORT Tracking, and Temporal Data Interpolation

Announce Type: new Abstract: The real-time hardships of video processing seriously limit the usage of Automatic License Plate Recognition (ALPR) with application in dynamic traffic monitoring settings. High-fidelity recognition of unconstrained variables, e.g. drastic variations in illumination, acute camera scans, high vehicle speeds, and harsh physical concealment, is a problem that often leads to disjointed tracking paths and poor Optical Character Recognition (OCR) rates. In order to...

arXiv CS 6d ago

ReforMe: Re-Shaping Documents with Contextual Prompting and Layout-Aware Propagation

arXiv:2606.03266v1 Announce Type: new Abstract: Digitizing complex documents with handwritten content, irregular tables, and heterogeneous layouts remains challenging, as traditional Optical Character Recognition (OCR) systems fail to capture writing nuances, author-specific conventions, and document structure, and recent LLM-based approaches lack mechanisms for precise, scalable correction. We present an interactive document digitization system that integrates layout-aware parsing, OCR, and...

arXiv CS 7d ago

Vision Language Model Helps Private Information De-Identification in Vision Data

Announce Type: new Abstract: Visual Language Models (VLMs) have gained significant popularity due to their remarkable ability. While various methods exist to enhance privacy in text-based applications, privacy risks associated with visual inputs remain largely overlooked such as Protected Health Information (PHI) in medical images. To tackle this problem, two key tasks: accurately localizing sensitive text and processing it to ensure privacy protection should be performed.

arXiv CS 1d ago

ChemQuests: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv papers

arXiv:2505.05232v3 Announce Type: replace Abstract: The rapid expansion of chemistry literature poses significant challenges for researchers seeking to efficiently access domain-specific knowledge. To support advancements in chemistry-focused natural language processing (NLP), we present ChemQuests, a curated dataset of 952 high-quality question-answer (QA) pairs derived from 155 ChemRxiv \cite{chemrxivWebsite} papers across 17 subfields of chemistry. Each QA pair is explicitly linked to its...

arXiv CS 2d ago

VTI-CoT: Visual-Textual Interleaved Chain of Thought for Video Reasoning

Announce Type: new Abstract: Video reasoning aims to understand complex temporal events and causal relationships within videos. Recently, Chain-of-Thought (CoT) has been introduced to this field to enhance reasoning accuracy. However, existing CoT-based video reasoning methods primarily rely on text-only information for logical deduction, overlooking critical visual information during the inference process.

arXiv CS 5d ago

SCOPE: Real-Time Natural Language Camera Agent at the Edge

Announce Type: new Abstract: Deploying language-driven agents in robotics requires evaluations that reflect real-world task demands: natural-language instructions with reproducible outcomes. Such agents must connect language models to callable perception and control tools, and be assessed using deployment-critical metrics including latency, accuracy, and error modes. (Simulation and Camera Operations for Perception and Evaluation), a modular agent for natural-language, open-vocabulary...

arXiv CS 7d ago

AIDEN: Design and Pilot Study of an AI Assistant for the Visually Impaired

arXiv:2511.06080v4 Announce Type: replace Abstract: This paper presents AIDEN, an artificial intelligence-based assistant designed to enhance the autonomy and daily quality of life of visually impaired individuals, who often struggle with object identification, text reading, and navigation in unfamiliar environments. Existing solutions such as screen readers or audio-based assistants facilitate access to information but frequently lead to auditory overload and raise privacy concerns in open...

arXiv CS 2d ago