Home Knowledge Base Explainability Evaluation

Explainability Evaluation

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

arXiv:2605.31563v1 Announce Type: new Abstract: Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggregate rationales beyond majority vote -- in light of this variation.

arXiv CS 9d ago

Beyond Accuracy: Evaluating Efficiency, Robustness and Explainability in Deep Learning for Malaria Diagnosis

arXiv:2605.30734v1 Announce Type: new Abstract: Malaria remains a leading cause of mortality in sub-Saharan Africa, where scarce diagnostic infrastructure makes timely, accurate diagnosis particularly challenging. While deep learning offers a compelling path toward automated malaria screening, clinical adoption is hindered by computational cost and opacity in decision-making. This work benchmarks four deep learning models spanning a wide range of designed design architectures and model...

arXiv CS 9d ago

Explain Like I'm 5 or Whatever I Choose: Evaluating the Interactive Potential of Language Model Responses

arXiv:2606.06788v1 Announce Type: new Abstract: Evaluations of large language models (LLMs) in scientific information seeking tasks have become increasingly use-centric, such as conducting live or multi-turn evaluations with real users. These evaluations still assume a single, static chat interface, but as models are integrated into new interfaces, evaluations must shift to incorporate interface-specific criteria. We propose a new evaluation framework based on a formative study with $16$...

arXiv CS 2d ago

KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models

new Abstract: Vision-language foundation models such as CLIP and SigLIP provide widely used representations for multimodal learning systems. While these models are typically compared through downstream performance, such evaluations often do not explain how their representations differ structurally. In this work, we study this problem through the task of Contrastive Embedding Clustering: identifying sample subsets that are weakly clustered under one representation but strongly clustered under...

arXiv CS 6d ago

Human-Centered Benchmarking of Driver Monitoring Models

arXiv:2606.08123v1 Announce Type: new Abstract: Vision-based driver monitoring systems are increasingly deployed in safety-critical intelligent transportation settings, yet they are almost always compared on classification accuracy alone. This paper argues that accuracy is insufficient to characterize a model's fitness for real-world deployment, and proposes the Human-Centered Benchmarking Framework (HCBF), which evaluates models across four dimensions: accuracy, explainability, efficiency,...

arXiv CS 1d ago

PROBE-Web: An Interactive System for Probing Evaluation Landscapes of Knowledge Graph Completion Models

Announce Type: new Abstract: Knowledge graph completion (KGC) models are commonly evaluated using rank-based metrics such as MRR and Hits@K, despite different users often requiring different evaluation perspectives. In this demo, we present PROBE-Web, an interactive system for probing diverse evaluation landscapes for KGC models. PROBE-Web enables users to flexibly evaluate KGC models by adjusting two critical perspectives: (P1) predictive sharpness and (P2) popularity-bias robustness.

arXiv CS 1d ago

Building Trust in Black-box Optimization: A Comprehensive Framework for Explainability

Announce Type: replace Abstract: Optimizing costly black-box functions within a constrained evaluation budget presents significant challenges in many real-world applications. Surrogate Optimization (SO) is a common resolution, yet its proprietary nature introduced by the complexity of surrogate models and the sampling core (e.g., acquisition functions) often leads to a lack of explainability and transparency. While existing literature has primarily concentrated on enhancing convergence to...

arXiv CS 7d ago

Diagnostic dilemma: Doctors couldn't explain why a boy was bleeding from his eyes, ears and nose

Diagnostic dilemma: Doctors couldn't explain why a boy was bleeding from his eyes, ears and nose A case of a boy who bled from his eyes eventually led doctors to a diagnosis that has been reported fewer than 50 times in the medical literature. The patient: An 11-year-old boy in India The symptoms: The boy's parents brought him to a hospital after he had several episodes of bleeding from his eyes, nose and ears. The episodes, which had occurred for about a month, seemed to start for no...

Live Science 7d ago

From Features to Actions: Explainability in Traditional and Agentic AI Systems

Announce Type: replace Abstract: Over the last decade, Explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structure. Recent advances in large language models (LLMs) have enabled agentic AI systems whose behaviour unfolds over multi-step trajectories. In these settings, success and failure are determined by sequences of decisions rather than a single output.

arXiv CS 8d ago

SODA-CitrON: Static Object Data Association by Clustering Multi-Modal Sensor Detections Online

arXiv:2602.22243v3 Announce Type: replace Abstract: The online fusion and tracking of static objects from heterogeneous sensor detections is a fundamental problem in robotics, autonomous systems, and environmental mapping. Although classical data association approaches such as JPDA are well suited for dynamic targets, they are less effective for static objects observed intermittently and with heterogeneous uncertainties, where motion models provide minimal discriminative power with respect...

arXiv CS 1d ago