Explainability Evaluation
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection
arXiv:2605.31563v1 Announce Type: new Abstract: Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggregate rationales beyond majority vote -- in light of this variation.
Beyond Accuracy: Evaluating Efficiency, Robustness and Explainability in Deep Learning for Malaria Diagnosis
arXiv:2605.30734v1 Announce Type: new Abstract: Malaria remains a leading cause of mortality in sub-Saharan Africa, where scarce diagnostic infrastructure makes timely, accurate diagnosis particularly challenging. While deep learning offers a compelling path toward automated malaria screening, clinical adoption is hindered by computational cost and opacity in decision-making. This work benchmarks four deep learning models spanning a wide range of designed design architectures and model...
Explain Like I'm 5 or Whatever I Choose: Evaluating the Interactive Potential of Language Model Responses
arXiv:2606.06788v1 Announce Type: new Abstract: Evaluations of large language models (LLMs) in scientific information seeking tasks have become increasingly use-centric, such as conducting live or multi-turn evaluations with real users. These evaluations still assume a single, static chat interface, but as models are integrated into new interfaces, evaluations must shift to incorporate interface-specific criteria. We propose a new evaluation framework based on a formative study with $16$...
KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models
new Abstract: Vision-language foundation models such as CLIP and SigLIP provide widely used representations for multimodal learning systems. While these models are typically compared through downstream performance, such evaluations often do not explain how their representations differ structurally. In this work, we study this problem through the task of Contrastive Embedding Clustering: identifying sample subsets that are weakly clustered under one representation but strongly clustered under...
Human-Centered Benchmarking of Driver Monitoring Models
arXiv:2606.08123v1 Announce Type: new Abstract: Vision-based driver monitoring systems are increasingly deployed in safety-critical intelligent transportation settings, yet they are almost always compared on classification accuracy alone. This paper argues that accuracy is insufficient to characterize a model's fitness for real-world deployment, and proposes the Human-Centered Benchmarking Framework (HCBF), which evaluates models across four dimensions: accuracy, explainability, efficiency,...
PROBE-Web: An Interactive System for Probing Evaluation Landscapes of Knowledge Graph Completion Models
Announce Type: new Abstract: Knowledge graph completion (KGC) models are commonly evaluated using rank-based metrics such as MRR and Hits@K, despite different users often requiring different evaluation perspectives. In this demo, we present PROBE-Web, an interactive system for probing diverse evaluation landscapes for KGC models. PROBE-Web enables users to flexibly evaluate KGC models by adjusting two critical perspectives: (P1) predictive sharpness and (P2) popularity-bias robustness.
Building Trust in Black-box Optimization: A Comprehensive Framework for Explainability
Announce Type: replace Abstract: Optimizing costly black-box functions within a constrained evaluation budget presents significant challenges in many real-world applications. Surrogate Optimization (SO) is a common resolution, yet its proprietary nature introduced by the complexity of surrogate models and the sampling core (e.g., acquisition functions) often leads to a lack of explainability and transparency. While existing literature has primarily concentrated on enhancing convergence to...
Diagnostic dilemma: Doctors couldn't explain why a boy was bleeding from his eyes, ears and nose
Diagnostic dilemma: Doctors couldn't explain why a boy was bleeding from his eyes, ears and nose A case of a boy who bled from his eyes eventually led doctors to a diagnosis that has been reported fewer than 50 times in the medical literature. The patient: An 11-year-old boy in India The symptoms: The boy's parents brought him to a hospital after he had several episodes of bleeding from his eyes, nose and ears. The episodes, which had occurred for about a month, seemed to start for no...
From Features to Actions: Explainability in Traditional and Agentic AI Systems
Announce Type: replace Abstract: Over the last decade, Explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structure. Recent advances in large language models (LLMs) have enabled agentic AI systems whose behaviour unfolds over multi-step trajectories. In these settings, success and failure are determined by sequences of decisions rather than a single output.
SODA-CitrON: Static Object Data Association by Clustering Multi-Modal Sensor Detections Online
arXiv:2602.22243v3 Announce Type: replace Abstract: The online fusion and tracking of static objects from heterogeneous sensor detections is a fundamental problem in robotics, autonomous systems, and environmental mapping. Although classical data association approaches such as JPDA are well suited for dynamic targets, they are less effective for static objects observed intermittently and with heterogeneous uncertainties, where motion models provide minimal discriminative power with respect...