Home › Knowledge Base › Textual Assessment and Visual Assessment

Textual Assessment and Visual Assessment

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation

arXiv:2606.01897v2 Announce Type: replace Abstract: Traditional Video Quality Assessment (VQA) focuses narrowly on aesthetic fidelity, overlooking the complex social dynamics that define quality in User-Generated Content (UGC). In this work, we propose a paradigm shift from signal-centric metrics to human-centric resonance assessment. We introduce CASTER (Community-Aware Assessment of Social Textual Engagement and Resonance), a new task that evaluates whether a UGC item achieves positive...

arXiv CS 7d ago

Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation

arXiv:2606.01897v1 Announce Type: new Abstract: Traditional Video Quality Assessment (VQA) focuses narrowly on aesthetic fidelity, overlooking the complex social dynamics that define quality in User-Generated Content (UGC). In this work, we propose a paradigm shift from signal-centric metrics to human-centric resonance assessment. We introduce CASTER (Community-Aware Assessment of Social Textual Engagement and Resonance), a new task that evaluates whether a UGC item achieves positive...

arXiv CS 8d ago

Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation

arXiv:2606.01897v3 Announce Type: replace Abstract: Traditional Video Quality Assessment (VQA) focuses narrowly on aesthetic fidelity, overlooking the complex social dynamics that define quality in User-Generated Content (UGC). In this work, we propose a paradigm shift from signal-centric metrics to human-centric resonance assessment. We introduce CASTER (Community-Aware Assessment of Social Textual Engagement and Resonance), a new task that evaluates whether a UGC item achieves positive...

arXiv CS 5d ago

Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis

arXiv:2606.02162v1 Announce Type: new Abstract: Document type classification in visually rich documents remains challenging, as relevant information is distributed across textual, visual, and layout modalities. To capture this complexity, current approaches rely on diverse multimodal modeling strategies, resulting in heterogeneous architectures that complicate systematic comparison. This variability is also reflected in existing comparative studies, which often rely on heterogeneous...

arXiv CS 8d ago

Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

arXiv:2605.29861v2 Announce Type: replace Abstract: Large Language Models (LLMs) have advanced autonomous agents from deep search, which retrieves concise factual answers, to deep research, which synthesizes scattered evidence into long-form reports. However, verifiable multimodal deep research remains challenging due to open-ended synthesis without deterministic ground truth and the need to interleave textual arguments with visual evidence. We propose Ptah, a multi-agent harness for...

arXiv CS 6d ago

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

arXiv:2604.03893v2 Announce Type: replace Abstract: Current multimodal benchmarks for scientific reasoning primarily evaluate local information extraction -- models recognize symbols and values and then perform textual inference. They do not assess whether models can reason over the global structural properties of formal diagrams, such as topology, conservation constraints, and the consistent mapping between visual patterns and algebraic expressions. We introduce FeynmanBench, a benchmark of...

arXiv CS 8d ago

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

arXiv:2606.02320v1 Announce Type: new Abstract: Deep Research Agents have shown strong capability in multi-step information retrieval, reasoning, and long-form report generation, but existing benchmarks and systems remain predominantly text-centric, with limited evaluation of whether visual elements are factually reliable and well aligned with the surrounding analysis. To address this gap, we introduce TVIR (Text--Visual Interleaved Report Generation), which includes TVIR-Bench, a benchmark...

arXiv CS 8d ago

Faithful, Enriched, and Precise: Benchmarking Natural-Science Illustration Generation by T2I models

Announce Type: replace Abstract: Scientific illustrations are essential tools for communicating research findings, especially in natural science, where they visualize complex concepts and processes. As Text-to-Image (T2I) models become increasingly capable, researchers have started to use them for scientific illustration generation. However, existing benchmarks often assess outputs at a holistic level, overlooking fine-grained elements, while scientific reasoning ability and output...

arXiv CS 2d ago

Faithful, Enriched, and Precise: Benchmarking Natural-Science Illustration Generation by T2I models

Announce Type: new Abstract: Scientific illustrations are essential tools for communicating research findings, especially in natural science, where they visualize complex concepts and processes. As Text-to-Image (T2I) models become increasingly capable, researchers have started to use them for scientific illustration generation. However, existing benchmarks often assess outputs at a holistic level, overlooking fine-grained elements, while scientific reasoning ability and output conciseness...

arXiv CS 5d ago

TGV-KV: Text-Grounded KV Eviction for Vision-Language Models

Announce Type: new Abstract: Vision-Language Models (VLMs) inherit the auto-regressive generation paradigm and cache the keys and values (KV) of all previous tokens to accelerate inference, resulting in memory consumption that scales linearly with context length. This issue is particularly pronounced in VLMs due to substantial redundancy in the visual modality. Although KV cache eviction approaches can effectively reduce inference memory, they often incur significant performance degradation...

arXiv CS 7d ago