LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Yifan Chen, Haitao Li, Yiran Hu, Kaisong Song, Jun Lin, Yueyue Wu, Qingyao Ai, Min Zhang, Yiqun Liu 1 min read

Key Points

arXiv:2606.09389v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly applied to real-world legal tasks, evaluating the reliability of their open-ended legal responses has become essential. These tasks require context-sensitive answers and allow little room for error, motivating fine-grained and diagnostic evaluation that can identify specific sources of response quality failures. We introduce LexRubric, a rubric-based benchmark for evaluating open-ended Chinese legal tasks. LexRubric contains 649 instances from legal consultation and judicial examination, which reflect both everyday legal needs and professional legal reasoning and cover 14 legal scenarios. It further includes 12,337 expert-written atomic scoring criteria organized under a unified six-dimensional framework, enabling accurate evaluation and diagnostic analysis across tasks and evaluation dimensions. To validate the reliability of the evaluation, we test multiple judge models and compare model-based judgments with human judgments. We further evaluate 18 recent general and legal-domain LLMs on LexRubric. Results show that different models exhibit distinct capability profiles, and that open-ended legal question remains challenging for current LLMs. Data is available at: https://github.com/foggpoy/LexRubric.

LexRubric (ORG) Chinese (ORG)

Originally published by arXiv CS Read original →

A death row prisoner whose planned execution on Thursday was suddenly halted became emotional when he learned that a federal court had ruled Alabama’s use of nitrogen gas violates the constitutional ban on cruel and unusual punishment. “It’s like an expected sigh of relief in one aspect, and then you still got to stay and maintain your focus and continue to fight,” Jeffery Lee, who has been on death row for nearly three decades, told NBC News by phone Tuesday. He spoke from the William C....

NBC News 27m ago

Nearly Everyone, Everywhere, Veers Left When Walking

Researchers are at a loss for why people across cultures and ages, regardless of their dominant hand, have a natural bias toward wandering in a counterclockwise direction.

NYT Science 36m ago

Popular UK seaside town hotel plunges into administration as holidaymakers updated

Popular UK seaside town hotel plunges into administration as holidaymakers updated This popular hotel has entered administration after closing for refurbishment in 2022 A long-shuttered seaside hotel in south Devon, which had been expected to welcome guests again following a major refurbishment, has reportedly gone into administration. According to a notice published by The Gazette, the UK's official public record, administrators were appointed on June 5.

Daily Mirror 47m ago

Scientists were excited about a blood test for many cancers — but it failed a big trial. Here's what to know.

Scientists were excited about a blood test for many cancers — but it failed a big trial. Emerging tests promise to screen for many cancers at once, but one just failed in a big trial. Will these diagnostics deliver on their promise someday?

Live Science 1h ago

LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

Related Stories

Jeffery Lee breathes ‘sigh of relief’ after Alabama’s nitrogen execution deemed unconstitutional

Nearly Everyone, Everywhere, Veers Left When Walking

Popular UK seaside town hotel plunges into administration as holidaymakers updated

Scientists were excited about a blood test for many cancers — but it failed a big trial. Here's what to know.