Coding Agents
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Agentic Very Much! Adoption of Coding Agent in New GitHub Projects
arXiv:2606.07448v1 Announce Type: new Abstract: In previous work, we investigated the adoption of coding agents in GitHub projects, finding that it was very significant. This study follows this line of work, but analyses new projects, that were created after the previous study. In this new sample, we find that the adoption of coding agents is more than twice as high.
What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants
arXiv:2605.30777v1 Announce Type: new Abstract: Autonomous coding agents built on large language models (LLMs) are rapidly being integrated into development workflows, yet their operational safety properties remain poorly understood beyond evaluations of explicitly malicious inputs. In practice, high-impact failures arise during benign, goal-directed use through environment breakage, fabricated success reports, etc. that current benchmarks do not capture. What categories of operational...
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
arXiv:2603.03202v3 Announce Type: replace Abstract: As large language models (LLMs) advance their mathematical capabilities toward the IMO and research level, the scarcity of challenging, high-quality problems has become a significant bottleneck for training, evaluation and self-evolution of LLMs. Simultaneously, recent code agents have demonstrated sophisticated skills in agentic coding and reasoning, suggesting that code execution can serve as a scalable environment for mathematical...
SmellBench: Towards Fine-Grained Evaluation of Code Agents on Refactoring Tasks
Announce Type: new Abstract: Code Agents have achieved remarkable advances in recent years, exhibiting strong capabilities across a wide range of software engineering tasks. However, their misuse often produces bloated and disorganized code that impairing readability, extensibility, and robustness. Despite this risk, existing benchmarks largely evaluate functional correctness rather than long-term maintainability of code agents.
SWE-Explore: Benchmarking How Coding Agents Explore Repositories
arXiv:2606.07297v1 Announce Type: new Abstract: Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding agents. Yet they usually treat coding tasks as a holistic, binary prediction problem (e.g., resolved or unresolved), neglecting fine-grained agent capabilities such as repository understanding, context retrieval, code localization, and bug diagnosis. In this paper, we introduce SWE-Explore, a benchmark that isolates the evaluation of...
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills
arXiv:2606.07412v1 Announce Type: new Abstract: LLM-driven software engineering agents have become a central testbed for real-world language-model capability, yet their training remains limited by the availability of high-quality SWE tasks. Existing synthetic data methods typically create tasks through fixed mutation or bug-injection procedures, making the resulting distributions largely independent of the agent's own weaknesses and training progress. We introduce Socratic-SWE, a closed-loop...
Show HN: VT Code – open-source terminal coding agent in Rust
Article URL: https://github.com/vinhnx/VTCode Comments URL: https://news.ycombinator.com/item?id=48332098 Points: 9 # Comments: 4
Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks
Announce Type: new Abstract: Coding-agent benchmarks evaluate whether a single uninterrupted agent can resolve a repository issue. Real software work is messier: tasks are interrupted, reassigned, reviewed, and resumed from partial states left by another agent or engineer. We study this missing dimension through \emph{handoff debt}: the rediscovery cost imposed when a predecessor's work is opaque or incomplete.
Show HN: Komi-learn – continuous memory and self-improvement for coding agents
Continuous memory and self-improvement for coding agents. It learns how you work and recalls it automatically, with no commands. Works with Claude Code and Codex.
Paseo – Beautiful open-source coding agent interface (desktop, mobile, CLI)
One interface for Claude Code, Codex, Copilot, OpenCode, and Pi agents. Run agents in parallel on your own machines. Ship from your phone or your desk. - Self-hosted: Agents run on your machine with your full dev environment.