Nonparametric LLM Evaluation
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Nonparametric LLM Evaluation from Preference Data
arXiv:2601.21816v2 Announce Type: replace Abstract: Evaluating the performance of large language models (LLMs) from human preference data is crucial for obtaining LLM leaderboards. However, many existing approaches either rely on restrictive parametric assumptions or lack valid uncertainty quantification when flexible machine learning methods are used.
LLMSynthor: Macro-Aligned Micro-Records Synthesis with Large Language Models
arXiv:2505.14752v3 Announce Type: replace Abstract: Macro-aligned micro-records are crucial for credible simulations in social science and urban studies. For example, epidemic models are only reliable when individual-level mobility and contacts mirror real behavior, while aggregates match real-world statistics like case counts or travel flows. However, collecting such fine-grained data at scale is impractical, leaving researchers with only macro-level data.
LLMSynthor: Macro-Aligned Micro-Records Synthesis with Large Language Models
arXiv:2505.14752v4 Announce Type: replace Abstract: Macro-aligned micro-records are crucial for credible simulations in social science and urban studies. For example, epidemic models are only reliable when individual-level mobility and contacts mirror real behavior, while aggregates match real-world statistics like case counts or travel flows. However, collecting such fine-grained data at scale is impractical, leaving researchers with only macro-level data.