Home Knowledge Base ReasonBench

ReasonBench

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

Announce Type: replace Abstract: Benchmark scores for LLM reasoning systems are reported as single numbers, yet the same model, strategy, and task can produce meaningfully different answers and costs across repeated executions, even under greedy decoding (T = 0). This variance is not a statistical nuisance: the highest-performing strategy wins only 77% of head-to-head runs against its nearest competitor, meaning a single observed score can silently misrank systems. We introduce ReasonBench,...

arXiv CS 8d ago

Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction

arXiv:2606.08147v1 Announce Type: cross Abstract: DNA cis-regulatory elements (CREs) such as enhancers control gene expression levels. Accurately predicting regulatory activity from DNA sequences is valuable but challenging, as it requires understanding complex biological regulatory processes. Existing methods typically regress activity scores from sequences in a black-box manner, limiting both interpretability and regression performance.

arXiv CS 1d ago