Home Knowledge Base Standalone Agents

Standalone Agents

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

What Makes Interaction Trajectories Effective for Training Terminal Agents?

arXiv:2606.03461v1 Announce Type: new Abstract: Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from task difficulty, harness design, and student capacity. We investigate this pedagogical link using Terminal-Lego, a scalable pipeline that transforms multi-domain real-world issues into environment-verified agentic tasks. Surprisingly, standalone performance does not dictate teaching efficacy: while Claude...

arXiv CS 7d ago

Overcoming the Regulatory Bottleneck via Agent-to-Agent Protocols: A Nuclear Case Study

arXiv:2606.07866v1 Announce Type: new Abstract: Regulatory review of advanced nuclear reactor designs routinely spans more than three years and consumes hundreds of millions of dollars in combined regulator and applicant labor. We present the Regulatory Context Protocol (RCP), an Agent-to-Agent communication standard that replaces the formal human-to-human pipeline between regulators and applicants with a structured, auditable agentic channel, while preserving human oversight at...

arXiv CS 1d ago

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

arXiv:2605.30611v1 Announce Type: new Abstract: Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper preparation. Existing automated systems each target a single figure type under text-only input, leaving the diversity of types and conditions researchers actually use unaddressed; their raster outputs further cannot be locally revised. Because...

arXiv CS 9d ago

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

arXiv:2606.05761v1 Announce Type: new Abstract: Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across contexts, or directly conflict, making correct assistance depend on memory relations rather than isolated recall. Existing long-term memory benchmarks rarely probe how agents preserve and utilize such relations during downstream tasks.

arXiv CS 5d ago

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

arXiv:2606.05761v2 Announce Type: replace Abstract: Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across contexts, or directly conflict, making correct assistance depend on memory relations rather than isolated recall. Existing long-term memory benchmarks rarely probe how agents preserve and utilize such relations during downstream tasks.

arXiv CS 2d ago

Argus-Retriever: Vision-LLM Late-Interaction Retrieval with Region-Aware Query-Conditioned MoE for Visual Document Retrieval

arXiv:2606.04300v1 Announce Type: new Abstract: Late-interaction vision-language retrievers represent each document page as many visual token embeddings and score queries with MaxSim. In systems such as ColPali, ColQwen, ColNomic, and Nemotron ColEmbed, the document embeddings are produced without seeing the query, so the same page is represented identically for a table lookup, a chart question, and a layout-sensitive evidence request. We introduce \textbf{Argus}, a family of...

arXiv CS 6d ago

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

Announce Type: new Abstract: Regulated cybersecurity workflows lack a runtime substrate that enforces organization-level scope across retrieval, tool calls, memory, findings, reports, and audit while remaining model-agnostic and locally deployable. Recent large language model (LLM) agent systems report strong results on isolated cybersecurity tasks, yet they do not by themselves define an auditable platform architecture for regulated security operations centre (SOC) and compliance workflows,...

arXiv CS 9d ago

ICE denies having a protester database. A letter to Congress sheds more light

ICE denies having a protester database. But a letter to Congress sheds more light Last January, when federal immigration agents started an immigration crackdown in Portland, Maine, pediatric occupational therapist Xenia Pantos was driving using their spouse's car to work when they saw masked federal agents and vehicles with tinted windows parked in the road. Worried about immigrant community members, Pantos stopped for a few minutes to observe.

Hacker News 9h ago

Language Model Networks: Supervision-Efficient Learning through Dense Communication

arXiv:2505.12741v3 Announce Type: replace Abstract: Language models are increasingly used not only as standalone predictors but also as components in larger inference systems, from test-time scaling to multi-agent collaboration. We study language model networks, where pre-trained language models serve as reusable nodes and intelligence emerges from their topology, communication, and optimization. Existing systems mostly communicate through natural language: easy to deploy, but discrete,...

arXiv CS 8d ago

Agent-R1: A Unified and Modular Framework for Agentic Reinforcement Learning

arXiv:2511.14460v2 Announce Type: replace Abstract: Large language models (LLMs) have rapidly evolved from single-turn text generators into the foundation of increasingly capable agents. As these agents take on more complex reasoning, decision making, tool use, and long-horizon tasks, reinforcement learning (RL) is becoming increasingly important for shaping their behavior. This shift is especially visible in agentic RL, where models must interact with tools and environments across multiple...

arXiv CS 8d ago