Technology
SW-$A^2$-Bench: Benchmarking Autonomous Software Agent Generation for Agentic Web
Key Points
Announce Type: replace Abstract: The Agentic Web is emerging as a paradigm in which autonomous software agents interact with online resources and with each other to accomplish user goals. However, the capacity of Agentic Web is still limited by insufficient autonomous software agent population, which has become a crucial challenge for scaling Agentic Web. In order to alleviate this, we study the task of automatically converting existing code repositories into autonomous software agents via...
arXiv:2604.04226v2 Announce Type: replace
Abstract: The Agentic Web is emerging as a paradigm in which autonomous software agents interact with online resources and with each other to accomplish user goals. However, the capacity of Agentic Web is still limited by insufficient autonomous software agent population, which has become a crucial challenge for scaling Agentic Web. In order to alleviate this, we study the task of automatically converting existing code repositories into autonomous software agents via coding agents, decompose the process into critical stages, and identify key technical hurdles. To systematically evaluate this capability, we propose SoftWare Agent generation for Agentic Web Bench (SW-$A^2$-Bench), the first benchmark designed for software agent generation. SW-$A^2$-Bench evaluates not only whether software agents can be generated, but also whether generated software agents are faithful to the source repositories and interoperable with other agents in multi-agent workflows. Our experiments demonstrate that our approach effectively activates the functional capabilities of code repositories and enables interoperable multi-agent collaboration in Agentic Web. We believe that this work will provide a standardized evaluation for software agent generation and will contribute to the future of scaling the capacity of Agentic Web.