Olympiad
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models
arXiv:2603.07751v2 Announce Type: replace Abstract: Current Large Language Models have achieved Olympiad-level logic, yet Vision-Language Models paradoxically falter on elementary spatial tasks like block counting. This capability mismatch reveals a critical ``spatial intelligence gap,'' where models fail to construct coherent 3D mental representations from 2D observations. We uncover this gap via diagnostic analyses showing the bottleneck is a missing view-consistent spatial interface...
A golden age of maths is dawning and mathematicians are freaking out
I am attempting to solve a mathematical conundrum that has stumped many of humanity’s greatest thinkers. I have zero mathematical training, apart from a distant undergraduate physics degree, which should put my odds of success at slim to none. But I also have a trick up my sleeve – a kind of mathematical genie that can conjure arcane secrets seemingly out of thin air.
When AI Builds Itself: Our progress toward recursive self-improvement
For most of AI’s history, humans drove every step in its development cycle. But at Anthropic, we are delegating a growing share of AI development to AI systems themselves, which is speeding up our work. Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor.
When the Scaffold Stays On: AI, Practice Style, and Screening in Elite Skill Formation
arXiv:2606.06253v1 Announce Type: cross Abstract: Generative AI raises short-term productivity by completing tasks that learners would otherwise practice on their own. Whether this substitution erodes frontier skill, the skill behind top-tail non-AI-aided performance, is an open question of rising stakes. The sharper question is whether selection mechanisms can screen apart two coexisting types: substitute-users, who use AI in place of deliberate practice, and complement-users, who use it to...
Sci-Rho: A Multilingual Visually-Grounded Symbolic Benchmark for STEM Problems
Announce Type: new Abstract: Symbolic benchmarks have emerged as a key approach to assess model robustness under minor modifications to STEM-related questions. However, existing symbolic benchmarks mostly remain limited to mathematical reasoning, lack visual grounding, and are predominantly in English. In this work, we introduce Sci-Rho (Science Rhobustness), a dynamic benchmark for visually-grounded STEM problems spanning five subjects and seven languages, comprising 4,242 problem templates...
INFUSER: Influence-Guided Self-Evolution Improves Reasoning
Announce Type: new Abstract: Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with only minimal external supervision. Yet existing methods either depend on extensively curated or teacher-generated training data, or, when the generator runs unsupervised, reward it by a difficulty heuristic that need not improve the solver. We introduce INFUSER, an iterative co-training framework with two co-evolving roles: a Generator that drafts...