Home Knowledge Base Language Agents

Language Agents

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Benchmarking Open-Ended Multi-Agent Coordination in Language Agents

arXiv:2606.08340v1 Announce Type: new Abstract: As language models are increasingly deployed as autonomous agents, they must coordinate with others over long horizons in open-ended interactive tasks. Yet existing evaluations rarely test these demands together, instead emphasising single-agent tasks, short interactions, or highly structured multi-agent settings. We introduce $alem$, a JAX-based benchmark for open-ended multi-agent coordination built on Craftax-like dynamics.

arXiv CS 1d ago

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Announce Type: new Abstract: Monitoring autonomous language model agents currently relies mostly on surface behavior. But what happens when agent populations invent new languages with the goal of avoiding human oversight. Here, we study the emergent languages on Moltbook.

arXiv CS 9d ago

AgentCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

Announce Type: replace Abstract: Language agents spend substantial inference time solving individual tasks, yet the experience acquired in one episode is often underutilized in future episodes. Continual learning expects an agent to accumulate reusable experience across a stream of tasks, improve over time, and avoid interference from irrelevant experiences. Unfortunately, existing benchmarks struggle to evaluate continual learning in language agents rigorously.

arXiv CS 7d ago

SCOPE: Real-Time Natural Language Camera Agent at the Edge

Announce Type: new Abstract: Deploying language-driven agents in robotics requires evaluations that reflect real-world task demands: natural-language instructions with reproducible outcomes. Such agents must connect language models to callable perception and control tools, and be assessed using deployment-critical metrics including latency, accuracy, and error modes. (Simulation and Camera Operations for Perception and Evaluation), a modular agent for natural-language, open-vocabulary...

arXiv CS 7d ago

AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

arXiv:2606.02461v1 Announce Type: new Abstract: Language agents spend substantial inference time solving individual tasks, yet the experience acquired in one episode is often underutilized in future episodes. Continual learning expects an agent to accumulate reusable experience across a stream of tasks, improve over time, and avoid interference from irrelevant experiences. Unfortunately, existing benchmarks struggle to evaluate continual learning in language agents rigorously.

arXiv CS 8d ago

AdaMEM: Test-Time Adaptive Memory for Language Agents

Announce Type: new Abstract: A central challenge for language agents is utilizing past experience to adapt to dynamic test-time conditions. While recent work demonstrates the promise of agentic memory mechanisms, most systems restrict retrieval to episode initiation. Consequently, agents are forced to rely on static guidance that becomes increasingly misaligned as long-horizon tasks unfold.

arXiv CS 5d ago

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

arXiv:2606.01838v1 Announce Type: new Abstract: Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis.

arXiv CS 8d ago

Causal state binding predicts action control in language agents

arXiv:2605.09692v3 Announce Type: replace Abstract: Autonomous language agents increasingly expose traces, memories, plans and constraints, but existing evaluations rarely test whether these state variables are bound to final actions. We introduce causal state binding, an intervention-coupled evaluation framework that measures whether actions change with the event-specific decisive state while remaining invariant to irrelevant cues. The primary readout is a hidden-target finite-action...

arXiv CS 8d ago

What Should a Skill Remember? Quality-Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents

Announce Type: new Abstract: Large language model agents increasingly rely on skills: reusable procedural documents encoding workflows, tool use, implementation patterns, validation checks, and domain rules. Skill rewriting is often treated as prompt compression, but shorter skills can make agents more expensive by removing sparse operational anchors that prevent exploration, debugging, and recovery. We study skill rewriting through this economic lens.

arXiv CS 1d ago

Policy and World Modeling Co-Training for Language Agents

Announce Type: new Abstract: Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do to the environment. World modeling (WM) can fill this gap, yet existing approaches often require separate simulators, extra training stages, or additional inference-time computation. We observe that on-policy RL rollouts already contain the needed signal: each transition pairs an...

arXiv CS 8d ago