Home Knowledge Base Deep Research

Deep Research

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

MosaicLeaks:Privacy Risks in Querying-in-the-Open for Deep Research Agents

arXiv:2605.30727v1 Announce Type: new Abstract: Deep research agents increasingly combine private local documents with external tools like web retrieval, creating a privacy risk: an agent's external queries may leak sensitive information from its local context. This risk is amplified by the mosaic effect, where individual queries may appear harmless but become revealing in aggregate. We introduce MosaicLeaks, a benchmark of 1,001 multi-hop deep research tasks that chain private enterprise...

arXiv CS 9d ago

Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

Announce Type: new Abstract: Deep research tasks require LLMs to plan what to investigate, retrieve evidence, and synthesize long-form answers across multiple branches of inquiry. Existing training paradigms either rely on short-form verifiable QA as a proxy or optimize monolithic long trajectories, which makes planning and execution difficult to disentangle and yields weak credit assignment for the planning process. We propose DecomposeR, a planner-centric deep research framework that...

arXiv CS 9d ago

Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

arXiv:2605.29861v2 Announce Type: replace Abstract: Large Language Models (LLMs) have advanced autonomous agents from deep search, which retrieves concise factual answers, to deep research, which synthesizes scattered evidence into long-form reports. However, verifiable multimodal deep research remains challenging due to open-ended synthesis without deterministic ground truth and the need to interleave textual arguments with visual evidence. We propose Ptah, a multi-agent harness for...

arXiv CS 6d ago

Self-Evolving Deep Research via Joint Generation and Evaluation

arXiv:2606.04507v1 Announce Type: new Abstract: Large Language Models (LLMs) have become increasingly adopted in daily applications, with deep research standing out as a particularly important capability. Unlike traditional question-answering (QA) tasks, deep research report generation lacks definitive ground-truth, making reward design inherently unverifiable and limiting effective reinforcement learning. Existing approaches mitigate this challenge with LLM-as-a-judge and query-dependent...

arXiv CS 6d ago

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps

arXiv:2605.17554v2 Announce Type: replace Abstract: Frontier deep research agents (DRAs) plan a research task, synthesize across documents, and return a structured deliverable on demand. They are being deployed in enterprise workflows faster than they are being evaluated. Existing benchmarks measure factual recall, single-hop QA, or generic agentic skill, missing the multi-document, decision-grade work DRAs are deployed to produce.

arXiv CS 8d ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Announce Type: new Abstract: Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make the answer unreliable. We study span-level error localization for deep-research agents.

arXiv CS 8d ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

arXiv:2606.02060v2 Announce Type: replace Abstract: Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make the answer unreliable. We study span-level error localization for deep-research agents.

arXiv CS 7d ago

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

arXiv:2606.02320v1 Announce Type: new Abstract: Deep Research Agents have shown strong capability in multi-step information retrieval, reasoning, and long-form report generation, but existing benchmarks and systems remain predominantly text-centric, with limited evaluation of whether visual elements are factually reliable and well aligned with the surrounding analysis. To address this gap, we introduce TVIR (Text--Visual Interleaved Report Generation), which includes TVIR-Bench, a benchmark...

arXiv CS 8d ago

AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents

arXiv:2605.11732v2 Announce Type: replace Abstract: In this paper, we present AgentDisCo, a novel Disentangled and Collaborative agentic architecture that formulates deep research as an adversarial optimization problem between information exploration and exploitation. Unlike existing approaches that conflate these two processes into a single module, AgentDisCo employs a critic agent to evaluate generated outlines and refine search queries, and a generator agent to retrieve updated results...

arXiv CS 5d ago

Search-Time Contamination in Deep Research Agents: Measuring Performance Inflation in Public Benchmark Evaluation

arXiv:2606.05241v1 Announce Type: new Abstract: Public benchmarks enable fair and reproducible evaluation of LLM reasoning, but they become fragile for deep research agents that actively search the web during inference. Such agents may retrieve public benchmark metadata, question context, or even ground-truth answers via web search.

arXiv CS 5d ago