Home › Knowledge Base › Metric Accuracy

Metric Accuracy

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Reasoning Structure of Large Language Models

arXiv:2606.03883v1 Announce Type: new Abstract: Large reasoning models (LRMs) are often evaluated using metrics such as final-answer accuracy or token count. However, identical scores on these metrics can hide fundamentally different reasoning structures.

arXiv CS 7d ago

Simulation of collision avoidance behavior in crowd movement by data-driven approach

Announce Type: new Abstract: Crowd movement simulation is essential for pedestrian safety management and facility layout optimization. Data-driven models enhance trajectory prediction accuracy under Euclidean metrics, yet they suffer from excessively high collision rates, especially in bidirectional and multidirectional flows. In this paper, we establish a novel data-driven crowd simulation model that incorporates the pedestrian collision mechanism into the loss function to reduce collisions.

arXiv CS 9d ago

Forgetting Has Neighbors: Localized Collateral Forgetting in Machine Unlearning

arXiv:2605.31317v1 Announce Type: new Abstract: Machine unlearning aims to remove the influence of selected training examples without full retraining. Standard evaluations often summarize unlearning quality with aggregate metrics, such as accuracy- and forgetting-based scores, which can hide localized failures. We study this failure mode at the example level by comparing the predictions of an unlearned model to those of the model retrained after deletion.

arXiv CS 9d ago

SCOPE: Real-Time Natural Language Camera Agent at the Edge

Announce Type: new Abstract: Deploying language-driven agents in robotics requires evaluations that reflect real-world task demands: natural-language instructions with reproducible outcomes. Such agents must connect language models to callable perception and control tools, and be assessed using deployment-critical metrics including latency, accuracy, and error modes. (Simulation and Camera Operations for Perception and Evaluation), a modular agent for natural-language, open-vocabulary...

arXiv CS 7d ago

CRANE: Knowledge Editing for Reasoning MLLMs

arXiv:2606.09033v1 Announce Type: new Abstract: The emergence of reasoning multimodal large language models (MLLMs), which generate explicit chain-of-thought (CoT) reasoning before producing answers, has introduced a new challenge for knowledge editing: methods that appear successful under traditional metrics (teacher-forcing accuracy up to 100%) can fail severely when the model's reasoning process is examined (Grounded Success as low as 0%). We identify three failure modes: (1) Structural...

arXiv CS 1d ago

HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions

arXiv:2503.14229v4 Announce Type: replace Abstract: Vision-and-Language Navigation (VLN) has been studied mainly in either discrete or continuous spaces, with little attention to dynamic, crowded environments. We present HA-VLN 2.0, a unified benchmark introducing explicit social-awareness constraints. Our contributions are: (i) a standardized task and metrics capturing both goal accuracy and personal-space adherence; (ii) HAPS 2.0 dataset and simulators modeling multi-human interactions,...

arXiv CS 1d ago

The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?

arXiv:2606.07435v1 Announce Type: new Abstract: Visual speech recognition (VSR) models now surpass human lipreaders on benchmarks, but do such gains establish human-like visual speech perception? To explore this, we compare three VSR systems with human baselines on the MaFI word-level lipreading dataset using word, character, phoneme, and viseme-level metrics. Although models achieve higher overall accuracy, they succeed and fail on different words than humans.

arXiv CS 2d ago

The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?

Announce Type: replace Abstract: Visual speech recognition (VSR) models now surpass human lipreaders on benchmarks, but do such gains establish human-like visual speech perception? To explore this, we compare three VSR systems with human baselines on the MaFI word-level lipreading dataset using word, character, phoneme, and viseme-level metrics. Although models achieve higher overall accuracy, they succeed and fail on different words than humans.

arXiv CS 1d ago

Surrogate Neural Architecture Codesign Package (SNAC-Pack)

arXiv:2605.16138v2 Announce Type: replace Abstract: Neural architecture search (NAS) is a powerful approach for automating model design, but existing methods often optimize for accuracy alone or rely on proxy metrics such as bit operations (BOPs) that correlate poorly with hardware cost. This gap is particularly large for FPGA deployment, where cost is dominated by a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency. We present the Surrogate Neural Architecture...

arXiv CS 5d ago

What Your Posts Reveal: A Benchmark and Agentic Framework for User-Level Privacy Leakage on Social Media

arXiv:2606.06784v1 Announce Type: new Abstract: Public social media posts can reveal private information through weak cues scattered across text, images, or metadata. Such leakage is often cumulative and cross-post: cues that appear harmless in isolation may jointly expose a user's home, workplace, or routine. However, current research lacks a unified benchmark for user-level multimodal privacy leakage and an evaluation metric that captures exposure severity beyond binary accuracy.

arXiv CS 2d ago