Home › Knowledge Base › Coding Agents

Coding Agents

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Agentic Very Much! Adoption of Coding Agent in New GitHub Projects

arXiv:2606.07448v1 Announce Type: new Abstract: In previous work, we investigated the adoption of coding agents in GitHub projects, finding that it was very significant. This study follows this line of work, but analyses new projects, that were created after the previous study. In this new sample, we find that the adoption of coding agents is more than twice as high.

arXiv CS 2d ago

What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants

arXiv:2605.30777v1 Announce Type: new Abstract: Autonomous coding agents built on large language models (LLMs) are rapidly being integrated into development workflows, yet their operational safety properties remain poorly understood beyond evaluations of explicitly malicious inputs. In practice, high-impact failures arise during benign, goal-directed use through environment breakage, fabricated success reports, etc. that current benchmarks do not capture. What categories of operational...

arXiv CS 9d ago

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

arXiv:2603.03202v3 Announce Type: replace Abstract: As large language models (LLMs) advance their mathematical capabilities toward the IMO and research level, the scarcity of challenging, high-quality problems has become a significant bottleneck for training, evaluation and self-evolution of LLMs. Simultaneously, recent code agents have demonstrated sophisticated skills in agentic coding and reasoning, suggesting that code execution can serve as a scalable environment for mathematical...

arXiv CS 8d ago

SmellBench: Towards Fine-Grained Evaluation of Code Agents on Refactoring Tasks

Announce Type: new Abstract: Code Agents have achieved remarkable advances in recent years, exhibiting strong capabilities across a wide range of software engineering tasks. However, their misuse often produces bloated and disorganized code that impairing readability, extensibility, and robustness. Despite this risk, existing benchmarks largely evaluate functional correctness rather than long-term maintainability of code agents.

arXiv CS 5d ago

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

arXiv:2606.07297v1 Announce Type: new Abstract: Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding agents. Yet they usually treat coding tasks as a holistic, binary prediction problem (e.g., resolved or unresolved), neglecting fine-grained agent capabilities such as repository understanding, context retrieval, code localization, and bug diagnosis. In this paper, we introduce SWE-Explore, a benchmark that isolates the evaluation of...

arXiv CS 2d ago

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

arXiv:2606.07412v1 Announce Type: new Abstract: LLM-driven software engineering agents have become a central testbed for real-world language-model capability, yet their training remains limited by the availability of high-quality SWE tasks. Existing synthetic data methods typically create tasks through fixed mutation or bug-injection procedures, making the resulting distributions largely independent of the agent's own weaknesses and training progress. We introduce Socratic-SWE, a closed-loop...

arXiv CS 2d ago

Show HN: VT Code – open-source terminal coding agent in Rust

Article URL: https://github.com/vinhnx/VTCode Comments URL: https://news.ycombinator.com/item?id=48332098 Points: 9 # Comments: 4

Hacker News 11d ago

Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks

Announce Type: new Abstract: Coding-agent benchmarks evaluate whether a single uninterrupted agent can resolve a repository issue. Real software work is messier: tasks are interrupted, reassigned, reviewed, and resumed from partial states left by another agent or engineer. We study this missing dimension through \emph{handoff debt}: the rediscovery cost imposed when a predecessor's work is opaque or incomplete.

arXiv CS 7d ago

Show HN: Komi-learn – continuous memory and self-improvement for coding agents

Continuous memory and self-improvement for coding agents. It learns how you work and recalls it automatically, with no commands. Works with Claude Code and Codex.

Hacker News 10d ago

Paseo – Beautiful open-source coding agent interface (desktop, mobile, CLI)

One interface for Claude Code, Codex, Copilot, OpenCode, and Pi agents. Run agents in parallel on your own machines. Ship from your phone or your desk. - Self-hosted: Agents run on your machine with your full dev environment.

Hacker News 7d ago