Home Knowledge Base Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

arXiv:2605.15416v2 Announce Type: replace Abstract: Jung et al. (2025) introduce a hypothesis testing framework for guaranteeing agreement between large language models (LLMs) and human judgments, relying on the assumption that the model's estimated confidence is monotonic with respect to human-disagreement risk. In practice, however, this assumption may be violated, and the generalization behavior of the confidence estimator is not explicitly analyzed.

arXiv CS 1d ago