Home Knowledge Base PutnamBench

PutnamBench

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Announce Type: new Abstract: We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies.

arXiv CS 5d ago

Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs

arXiv:2604.18587v2 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated significant potential in formal theorem proving, yet state-of-the-art performance often necessitates prohibitive test-time compute via massive roll-outs or extended context windows. In this work, we address this scalability bottleneck by exploiting an informative structure in formal verification: the observation that compilers map a vast space of diverse proof attempts to a compact set of...

arXiv CS 9d ago

Proof-Refactor: Refactoring Generated Formal Proofs into Modular Artifacts

arXiv:2606.03743v1 Announce Type: new Abstract: While Large Language Models (LLMs) have shown strong performance in generating formal proofs, their outputs often remain less readable, modular, maintainable, and reusable than proofs in mature formal mathematics libraries. We argue that this gap stems in part from the compile-first objective implicit in most proof-generation pipelines, which encourages monolithic or ad hoc proof scripts rather than library-quality artifacts. Existing...

arXiv CS 7d ago

Optimizing the Cost-Quality Tradeoff of Agentic Theorem Provers in Lean

arXiv:2606.04883v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in workflows for generating formal proofs in Lean. These workflows often decompose problems into smaller lemmas, sample many proof attempts, and use compiler feedback to guide search. However, they can be prohibitively expensive, often spending substantial compute on attempts that ultimately fail.

arXiv CS 6d ago

Formally Solving Answer-Construction Problems in Lean

arXiv:2505.18492v5 Announce Type: replace Abstract: Mathematical competition problems fall into two broad types: theorem proving, which asks for a proof of a given statement, and answer construction, which requires constructing a property-satifying object with proofs. With recent advances in large language models (LLMs), formal theorem-proving techniques have made substantial progress on theorem-proving problems, yet formal answer construction remains less studied. This exposes a mismatch...

arXiv CS 8d ago