SW-$A^2$-Bench: Benchmarking Autonomous Software Agent Generation for Agentic Web

arXiv CS Monday 08 June 2026, 04:00 UTC By Linyao Chen, Bo Huang, Qinlao Zhao, Shuai Shao, Zhi Han, Zicai Cui, Ziheng Zhang, Guangtao Zeng, Wenzheng Tang, Yikun Wang, Yuanjian Zhou, Zimian Peng, Yong Yu, Weiwen Liu, Hiroki Kobayashi, Weinan Zhang 1 min read

Key Points

Announce Type: replace Abstract: The Agentic Web is emerging as a paradigm in which autonomous software agents interact with online resources and with each other to accomplish user goals. However, the capacity of Agentic Web is still limited by insufficient autonomous software agent population, which has become a crucial challenge for scaling Agentic Web. In order to alleviate this, we study the task of automatically converting existing code repositories into autonomous software agents via...

arXiv:2604.04226v2 Announce Type: replace Abstract: The Agentic Web is emerging as a paradigm in which autonomous software agents interact with online resources and with each other to accomplish user goals. However, the capacity of Agentic Web is still limited by insufficient autonomous software agent population, which has become a crucial challenge for scaling Agentic Web. In order to alleviate this, we study the task of automatically converting existing code repositories into autonomous software agents via coding agents, decompose the process into critical stages, and identify key technical hurdles. To systematically evaluate this capability, we propose SoftWare Agent generation for Agentic Web Bench (SW-$A^2$-Bench), the first benchmark designed for software agent generation. SW-$A^2$-Bench evaluates not only whether software agents can be generated, but also whether generated software agents are faithful to the source repositories and interoperable with other agents in multi-agent workflows. Our experiments demonstrate that our approach effectively activates the functional capabilities of code repositories and enables interoperable multi-agent collaboration in Agentic Web. We believe that this work will provide a standardized evaluation for software agent generation and will contribute to the future of scaling the capacity of Agentic Web.

Benchmarking Autonomous Software (ORG) Agentic (ORG) SoftWare (ORG) Agentic Web Bench (ORG) Agentic Web (LOCATION)

Originally published by arXiv CS Read original →

SW-$A^2$-Bench: Benchmarking Autonomous Software Agent Generation for Agentic Web

Related Stories

They Tried To Catch a Child Predator on a Livestream. They Trapped Themselves Instead.

macOS 27 beta boots Asahi Linux off Apple Silicon

OpenClaw that is 'helping' Chinese companies do what govt has banned them from doing

Apollo’s Zito Says Too Much AI Spending Is for ‘Low IQ’ Tasks