Home Knowledge Base Robust Rank Aggregation

Robust Rank Aggregation

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets

Announce Type: new Abstract: Large-scale multilingual text embedding models play crucial role in both research and industry, yet their behavior in language-specific, multi-task settings remains insufficiently understood. Although benchmarking platforms such as MTEB report results across more than 250 languages, conclusions about model superiority often depend on implicit choices of dataset compositions and performance aggregation methods. To address this gap, we present a meta-study of...

arXiv CS 9d ago

Position: State-of-the-Art Claims Require State-of-the-Art Evidence

arXiv:2605.17273v3 Announce Type: replace Abstract: State-of-the-Art (SOTA) claims pervade Artificial Intelligence (AI) and Machine Learning (ML) research. These claims rest on benchmark evaluations, where models are ranked by aggregate scores across tasks. Public benchmarks or leaderboards are the most visible instance, but the same structure appears in paper tables throughout the literature.

arXiv CS 6d ago

STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems

arXiv:2605.02122v2 Announce Type: replace Abstract: Human evaluation remains the primary standard for assessing modern AI systems, yet annotator disagreement, bias, and variability make system rankings fragile under standard majority vote aggregation. Majority vote discards annotator reliability and item-level ambiguity, often yielding unstable comparisons across annotator subsets. We introduce STABLEVAL, a disagreement-aware evaluation framework that models latent item correctness and...

arXiv CS 8d ago

Amortizing Federated Adaptation: Hypernetwork Driven LoRA for Personalized Foundation Models

new Abstract: Federated fine-tuning of foundation models using Low-Rank Adaptation (LoRA) offers a communication efficient solution for distributed learning. However, existing federated LoRA methods suffer from two fundamental limitations: (1) structural aggregation bias, where independently averaging low rank factors fails to approximate the true combined update, and (2) client side initialization lag, as clients repeatedly reinitialize LoRA parameters across communication rounds, slowing...

arXiv CS 5d ago

Human-Centered Benchmarking of Driver Monitoring Models

arXiv:2606.08123v1 Announce Type: new Abstract: Vision-based driver monitoring systems are increasingly deployed in safety-critical intelligent transportation settings, yet they are almost always compared on classification accuracy alone. This paper argues that accuracy is insufficient to characterize a model's fitness for real-world deployment, and proposes the Human-Centered Benchmarking Framework (HCBF), which evaluates models across four dimensions: accuracy, explainability, efficiency,...

arXiv CS 1d ago

MarkerScout: A Disease-Agnostic Machine Learning Framework for Biomarker Prediction from Multi-Scale Mechanistic Models

Identifying robust biomarkers from high-dimensional biomedical data is a central challenge in translational research, but candidate rankings produced by any single feature-selection or classification method depend on algorithmic choices and rarely reproduce across pipelines. We present a disease-agnostic machine-learning framework that addresses this dependence by systematically benchmarking 25 (feature-selection x classifier) pipelines under five-fold stratified cross-validation,...

bioRxiv 5d ago

Assessing Region-Level EEG Contributions to Cognitive Workload Prediction

arXiv:2606.02598v1 Announce Type: new Abstract: Accurate and generalizable estimation of cognitive workload from electroencephalography (EEG) is critical for human-centered and safety-critical systems. Although EEG is widely used for workload assessment, the consistency of region-level EEG contributions across tasks, datasets, and subjects remains unclear. This paper presents a region-level evaluation framework for EEG-based workload prediction in which models are trained and evaluated using...

arXiv CS 7d ago

Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies

arXiv:2606.07492v1 Announce Type: new Abstract: The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms.

arXiv CS 2d ago

Generalized Rank-based Evaluation for Knowledge Graph Completion: Perspectives, Framework, and Analyses

Announce Type: new Abstract: Knowledge graph completion (KGC) aims to predict missing facts from an observed knowledge graph (KG), playing a crucial role in a wide range of real-world applications such as drug discovery, recommender systems, and retrieval-augmented generation (RAG). Although numerous KGC models have been proposed, the evaluation of KGC remains underexplored, despite its critical role in reliably assessing model performance and selecting appropriate models for real-world...

arXiv CS 1d ago

Rank-Constrained Deep Matrix Completion for Group Recommendation

new Abstract: The growing popularity of group activities has increased the need for methods that provide recommendations to groups of users given their individual preferences. Many existing group recommender systems rely on aggregating individual user preferences, but they often struggle with high-dimensional and highly sparse rating data commonly found in real-world scenarios. We propose Group Rank-Constrained Deep Matrix Completion (Group RC-DMC), a novel framework that extends RC-DMC by...

arXiv CS 8d ago