Home Knowledge Base CoEval

CoEval

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

CoEval: Ranking Language Models for Custom Tasks Without Labeled Data or Trustworthy Benchmarks

arXiv:2606.03650v1 Announce Type: new Abstract: Choosing or ranking language models for a specific application is hardest when no task-specific labeled data exists, and standard public benchmarks cannot be trusted, their items having likely leaked into pretraining, so scores reflect memorization rather than fitness. We present CoEval, an open-source, reusable framework that closes this gap end to end: from only a description of a task or domain, teacher models synthesize a fresh,...

arXiv CS 7d ago

CoEval: Ranking Language Models for Custom Tasks Without Labeled Data or Trustworthy Benchmarks

Announce Type: replace Abstract: Selecting a pretrained language model, or evaluating a fine-tuned one, for a specific application is a high-value decision, yet the public benchmarks used to make it are poorly suited: a generic benchmark need not reflect a particular sub-domain or sub-task, and its scores are suspect when its items have leaked into pretraining and are recalled rather than solved. We present CoEval, an open framework that supplies a trustworthy, task-specific signal through...

arXiv CS 5d ago