Home Knowledge Base Agentic Harnesses

Agentic Harnesses

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Computer Science > Computation and Language [Submitted on 14 May 2026] Title:Is Grep All You Need? How Agent Harnesses Reshape Agentic Search View PDF HTML (experimental)Abstract:Recent advances in Large Language Model (LLM) agents have enabled complex agentic workflows where models autonomously retrieve information, call tools, and reason over large corpora to complete tasks on behalf of users.

Hacker News 1d ago

VeRO: A Harness for Agents to Optimize Agents

arXiv:2602.22480v4 Announce Type: replace Abstract: An important emerging application of coding agents is agent harness optimization: the iterative improvement of a target agent by editing and evaluating its code. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this task. Harness optimization differs from conventional software engineering: agent harnesses interleave deterministic code with stochastic LLM completions, requiring structured...

arXiv CS 7d ago

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

Announce Type: new Abstract: LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harness feedback. These assets can improve task execution without changing model weights, but they are often revised by heuristic reflection or by reusing observed successes and failures as if counts alone were reliable belief. We introduce \textbf{Bayesian-Agent}, a native and cross-harness framework that treats reusable skills and SOPs as hypotheses about...

arXiv CS 1d ago

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Announce Type: new Abstract: LLM agents are evolving from conversational chatbots to operational tools in real-world workspaces. In local agentic harnesses, an LLM can read and write files, call tools, and reuse workspace state across sessions. While such capabilities enhance utility, they also expose a new attack surface for attackers.

arXiv CS 9d ago

From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws

Announce Type: new Abstract: LLM-based agents increasingly rely on harnesses that provide execution environments, tool interfaces, context, lifecycle orchestration, observability, verification, and governance. Existing self-improving agents and automatic harness evolution methods mainly improve agents through runtime supervision, prompt optimization, workflow search, or harness modification based on final outcomes. However, they often fail to diagnose where the responsible evidence lies in...

arXiv CS 5d ago

Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

arXiv:2605.29861v2 Announce Type: replace Abstract: Large Language Models (LLMs) have advanced autonomous agents from deep search, which retrieves concise factual answers, to deep research, which synthesizes scattered evidence into long-form reports. However, verifiable multimodal deep research remains challenging due to open-ended synthesis without deterministic ground truth and the need to interleave textual arguments with visual evidence. We propose Ptah, a multi-agent harness for...

arXiv CS 6d ago

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

arXiv:2606.02373v1 Announce Type: new Abstract: Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that...

arXiv CS 8d ago

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

arXiv:2605.30611v1 Announce Type: new Abstract: Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper preparation. Existing automated systems each target a single figure type under text-only input, leaving the diversity of types and conditions researchers actually use unaddressed; their raster outputs further cannot be locally revised. Because...

arXiv CS 9d ago

DAR: Deontic Reasoning with Agentic Harnesses

arXiv:2606.05009v1 Announce Type: new Abstract: Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key technical challenge for LLM-based deontic reasoning is that the relevant ruleset can be long and cross-referenced, so models may still fail to locate the rules needed for a particular reasoning step. We introduce Deontic...

arXiv CS 6d ago

Act As a Real Researcher: A Suite of Benchmarks Evaluating Frontier LLMs and Agentic Harnesses in Research Lifecycle

Announce Type: new Abstract: As foundation models advance and agent scaffolding becomes increasingly sophisticated, agents have demonstrated remarkable proficiency in complex, long-horizon coding tasks and even autonomous experiment execution. Despite their evolution from research assistants into autonomous research agents, these systems still exhibit significant limitations in field sensitivity, research ethics, and nuanced scientific judgment. Consequently, frontier agents remain unable to...

arXiv CS 2d ago