Home Knowledge Base Dynamic Rubric-Based Benchmark

Dynamic Rubric-Based Benchmark

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following

arXiv:2606.03116v1 Announce Type: cross Abstract: The rapid advancement of instruction-guided audio generation has highlighted the critical need for robust alignment evaluation. Current automated evaluation methods heavily rely on holistic scoring from general-purpose large language models, which struggle to decouple complex instructions, lack interpretability, and fail to capture fine-grained attribute mismatches. To address this, we introduce a novel dynamic rubric-based evaluation...

arXiv CS 7d ago

Generating and Refining Dynamic Evaluation Rubrics for LLM-as-a-Judge

Announce Type: new Abstract: LLM-as-a-Judge is a scalable alternative to human evaluation, yet existing rubric-based methods rely on human-annotated data such as reference answers or expert-crafted rubrics. We propose to automatically generate fine-grained evaluation rubrics without any human annotation. Our training-free method generates rubrics at dataset-specific and instance-specific granularities, achieving performance competitive with existing methods across four benchmarks.

arXiv CS 9d ago

From Holistic Evaluation to Structured Criteria: Rubrics Across the Evolving LLM Landscape

Announce Type: new Abstract: As Large Language Models (LLMs) advance toward open-ended autonomous agents, the mechanisms used to evaluate and guide their behavior must evolve accordingly. This work introduces the rubric as a unifying framework capturing this evolution, characterizing rubrics as a dynamic response to successive LLM paradigm shifts that recurs across otherwise independent efforts in evaluation, reinforcement learning, and safety alignment.

arXiv CS 1d ago

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

arXiv:2606.07299v1 Announce Type: new Abstract: Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demanding systems that can iteratively frame problems, acquire evidence, verify sources, and synthesize long-form reports. In practice, however, current DR systems are constrained by four interrelated limitations: long-horizon planning over an underspecified scope, the bottleneck of decomposing and scheduling such tasks within a single agent,...

arXiv CS 2d ago