TELBench

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Announce Type: new Abstract: Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make the answer unreliable. We study span-level error localization for deep-research agents.

arXiv CS 8d ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

arXiv:2606.02060v2 Announce Type: replace Abstract: Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make the answer unreliable. We study span-level error localization for deep-research agents.

arXiv CS 7d ago

Sovereign News Station

Self-hosted. No tracking. No ads. Independent news intelligence powered by sovereign infrastructure.

Daily briefing to your inbox:

Subscribed. Welcome aboard.

Home Live Analysis Trending Analytics Operations RSS Feed About

Sovereign News Station — Independent news intelligence · Self-hosted · No tracking