LiveCodeBench-Plus
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution
arXiv:2606.01286v1 Announce Type: new Abstract: The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the ability of existing datasets to differentiate model capabilities or provide useful training signal. For instance, on LiveCodeBench, frontier models achieve over 99% Pass@1 on easy splits and exceed 90% Pass@1 on average across difficulty levels.