Home Knowledge Base Autonomous Runnable

Autonomous Runnable

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Evidence Over Plans: Online Trajectory Verification for Skill Distillation

Announce Type: replace Abstract: Agent skills can remarkably improve task success rates by using human-written procedural documents, but their quality is difficult to assess without environment-grounded verification. Existing skill generation methods heavily rely on preference logs rather than direct environment interaction, often yielding negligible or even degraded gains. We identify that it is a fundamental timing bottleneck: robust skills should be posterior-based, distilled from...

arXiv CS 5d ago

DeployBench: Benchmarking LLM Agents for Research Artifact Deployment

arXiv:2606.05238v1 Announce Type: new Abstract: LLM agents have made rapid progress on software engineering and ML research tasks, but these advances often assume access to a working runnable environment. For research artifacts released alongside published papers, setting up such an environment from a fresh machine remains a major bottleneck. Existing environment setup benchmarks do not cover the full scope of research artifact deployment, which involves multi-language toolchains,...

arXiv CS 5d ago