Competitive Programming
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Where Do Large Language Models Fail on Competitive Programming? A Taxonomy of Failures by Algorithm Type and Difficulty Rating
arXiv:2606.05228v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate increasing proficiency on competitive programming benchmarks, yet technical reports predominantly publish aggregate pass rates, obscuring domain-specific vulnerabilities. We present a systematic empirical study of LLM failure patterns using a balanced taxonomy of 315 Codeforces problems across seven algorithm categories and three difficulty tiers. We evaluate GPT-4o and Claude Sonnet 4.6 under strict...
CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
arXiv:2602.20213v2 Announce Type: replace Abstract: The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agent framework dedicated to generating targeted adversarial test cases that expose latent vulnerabilities in program submissions.
CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models
Announce Type: new Abstract: This paper introduces Code Bench, a benchmark capable of evaluating Large Language Models (LLMs) concise code generation abilities in 60 programming languages. Based on code golf, a recreational programming competition focused on minimal character or byte solutions, the benchmark provides a distinctive measure of LLMs ability to produce efficient, concise code. Unlike existing benchmarks limited by fixed problem sets and language coverage, CodeGolf Bench...
Reasoning Models Don't Just Think Longer, They Move Differently
Announce Type: replace Abstract: Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer...
Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning
arXiv:2605.27000v2 Announce Type: replace Abstract: Repeated sampling with a verifier is the standard way to allocate test-time compute for code generation, with pass@$K$ as the canonical metric. Yet the standard policy class draws $K$ independent samples from a single answer distribution, so attempts often collapse onto near-duplicate reasoning paths and waste the budget on redundant rollouts. This failure is costly in competitive programming, where many problems admit multiple distinct...
Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification
Announce Type: replace Abstract: Many inference-time language-model pipelines combine a cheap reward signal with an expensive verifier, such as exact answer checking in mathematical reasoning or hidden-test execution in code generation. We formalize this setting using a learning-theoretic lens as generative active search: a cost-sensitive first-positive search problem in which a policy adaptively samples candidates from an unknown distribution, observes cheap scores, and pays for verifier...
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
arXiv:2606.03108v1 Announce Type: new Abstract: Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM policies and training-side harnesses through empirical feedback: it diagnoses rollout-level evidence, revises diagnostics, backtests interventions,...
Multi-Rollout On-Policy Distillation via Peer Successes and Failures
Announce Type: replace Abstract: Large language models are often post-trained with sparse verifier rewards, which indicate whether a sampled trajectory succeeds but provide limited guidance about where reasoning succeeds or fails. On-policy distillation (OPD) offers denser token-level supervision by training on student-generated trajectories, yet existing methods typically distill each rollout independently and ignore the other attempts sampled for the same prompt. We introduce Multi-Rollout...
In a first, scientists translated an entire viral genome so a quantum computer could read and analyze it
In a first, scientists translated an entire viral genome so a quantum computer could read and analyze it Scientists have uploaded a viral genome to a quantum computer, marking an important step for the future of quantum-enabled advancements in biology. Scientists say they have uploaded a real genome to a quantum computer for the first time, marking an important step in applying the emerging technology to biology. The researchers encoded the entire genome of the hepatitis D virus (HDV) onto a...
University of Toledo pole vaulter Eva Moran killed in three-vehicle crash in Ohio at age 19
A three-vehicle crash in Ohio left a Division I college athlete dead at the age of 19.University of Toledo pole vaulter Eva Moran, 19, was the only person killed in the Ohio crash in which another 19-year-old and a 23-year-old were involved. "The University of Toledo community is heartbroken by the loss of Eva Moran," athletic director Tom Moreland said in a release. "Eva was an outstanding student-athlete whose determination, character and positive spirit made an impact on everyone who had...