Home Knowledge Base the AgentLens SDK

the AgentLens SDK

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

arXiv:2605.12925v3 Announce Type: replace Abstract: Evaluation of software engineering (SWE) agents is dominated by a binary signal: whether the final patch passes the tests. This outcome-only view treats a principled solution and a chaotic trial-and-error process as equivalent. We show that this equivalence is empirically false.

arXiv CS 7d ago