Home › Knowledge Base › Autonomous Coding Agents

Autonomous Coding Agents

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents

arXiv:2601.21372v2 Announce Type: replace Abstract: We present NEMO, a system that translates Natural-language descriptions of decision problems into formal Executable Mathematical Optimization implementations using autonomous coding agents (ACAs). Existing approaches rely on specialized large language models (LLMs) or bespoke task-specific agents that are often brittle and frequently generate syntactically invalid or non-executable code. NEMO instead treats ACAs as a first-class abstraction...

arXiv CS 9d ago

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

Announce Type: new Abstract: Current AI benchmarks evaluate agents on task execution within human-designed workflows. These evaluations fundamentally fail to measure a critical next-level capability: whether models can autonomously develop agent systems. We introduce the Meta-Agent Challenge (MAC), an evaluation framework designed to test the capacity of frontier models for autonomous agent development.

arXiv CS 6d ago

SW-$A^2$-Bench: Benchmarking Autonomous Software Agent Generation for Agentic Web

Announce Type: replace Abstract: The Agentic Web is emerging as a paradigm in which autonomous software agents interact with online resources and with each other to accomplish user goals. However, the capacity of Agentic Web is still limited by insufficient autonomous software agent population, which has become a crucial challenge for scaling Agentic Web. In order to alleviate this, we study the task of automatically converting existing code repositories into autonomous software agents via...

arXiv CS 2d ago

What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants

arXiv:2605.30777v1 Announce Type: new Abstract: Autonomous coding agents built on large language models (LLMs) are rapidly being integrated into development workflows, yet their operational safety properties remain poorly understood beyond evaluations of explicitly malicious inputs. In practice, high-impact failures arise during benign, goal-directed use through environment breakage, fabricated success reports, etc. that current benchmarks do not capture. What categories of operational...

arXiv CS 9d ago

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

arXiv:2603.03202v3 Announce Type: replace Abstract: As large language models (LLMs) advance their mathematical capabilities toward the IMO and research level, the scarcity of challenging, high-quality problems has become a significant bottleneck for training, evaluation and self-evolution of LLMs. Simultaneously, recent code agents have demonstrated sophisticated skills in agentic coding and reasoning, suggesting that code execution can serve as a scalable environment for mathematical...

arXiv CS 8d ago

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

arXiv:2606.02302v1 Announce Type: new Abstract: Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent security benchmarks often rely on manually curated tasks, provide limited coverage of emerging threats, and focus primarily on final outcomes rather...

arXiv CS 8d ago

The Impact of Configuring Agentic AI Coding Tools on Build-vs-Buy Decisions: A Study Protocol

arXiv:2606.03907v1 Announce Type: new Abstract: Agentic AI coding tools write code with increasing autonomy and in doing so decide when to import a library and when to implement functionality from scratch. These decisions, whether to build functionality from scratch or buy into an external library, hereafter build-versus-buy, carry direct consequences for software security, licensing compliance, performance, and long-term maintainability. Yet no controlled experimental study has examined...

arXiv CS 7d ago

From Segments to Scenes: Temporal Understanding for Agentic Autonomous Driving via Vision-Language Models

Announce Type: replace Abstract: Vision-Language Models (VLMs) are increasingly deployed as the perception and reasoning backbone of autonomous agents acting in the wild, with autonomous driving (AD) being one of the most safety-critical instances. Reliable temporal understanding is essential for such agents to anticipate events, attribute causes, and act safely in dynamic environments, yet this remains a significant challenge even for state-of-the-art (SoTA) VLMs. Prior video benchmarks...

arXiv CS 6d ago

Exploring Autonomous Agentic Data Engineering for Model Specialization

arXiv:2605.30407v2 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data. Existing LLM-based data curation methods primarily rely on human-designed workflows, leaving it unexamined whether LLMs can autonomously execute an end-to-end data engineering pipeline for model specialization. We formalize Autonomous Agentic Data Engineering, a...

arXiv CS 1d ago

RTLScout: Joint Agentic Code and Synthesis Optimization for Efficient Digital Circuits

Announce Type: new Abstract: We present RTLScout, an autonomous system that combines LLM-driven agentic design with circuit-level synthesis optimization and arithmetic architecture sweeps. An LLM agent iteratively writes, evaluates, and refines RTL designs using tool calls, guided by quantitative PPA (power, performance, area) feedback from Yosys and OpenROAD. We introduce a multi-run elite pool framework, where the best designs and lessons learned seed subsequent agent runs.

arXiv CS 2d ago