Home › Knowledge Base › GEPA

GEPA

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DD-GEPA: Prompt Optimization for Dialogue Disentanglement Focusing on Task Instruction and Utterance Representation

Announce Type: new Abstract: Multi-party chat often contains interleaved dialogues because multiple participants can discuss different topics at the same time. Dialogue disentanglement addresses this problem by separating an entangled utterance sequence into coherent dialogues. While large language models (LLMs) are promising for this task, they still struggle with dialogue disentanglement and achieve low accuracy.

arXiv CS 1d ago

Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization

arXiv:2603.18388v2 Announce Type: replace Abstract: Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed,...

arXiv CS 1d ago

Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams

arXiv:2606.01770v2 Announce Type: replace Abstract: Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmarks. Real deployments instead present open-ended task streams: histories grow without a fixed endpoint, heterogeneous tasks require different harnesses, and problem distributions shift over time. These...

arXiv CS 6d ago

Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams

arXiv:2606.01770v1 Announce Type: new Abstract: Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmarks. Real deployments instead present open-ended task streams: histories grow without a fixed endpoint, heterogeneous tasks require different harnesses, and problem distributions shift over time. These...

arXiv CS 8d ago

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Announce Type: replace Abstract: Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so typically require hundreds of training samples and thousands of model rollouts, making them expensive in the best case and intractable in the worst. To address this challenge, we introduce Contrastive Reflection (CORE), a non-parametric learning algorithm that...

arXiv CS 2d ago

Building Customer Support AI Agents at 100M-User Scale: An Evaluation-Driven Framework

arXiv:2606.08867v1 Announce Type: new Abstract: The rapid rise in LLM capabilities has made AI agents increasingly viable across a broad range of tasks. Among the most promising applications is building production-ready customer-facing agents, a challenge that demands coordinated excellence in evaluation methodology, context engineering, training, and online measurement.

arXiv CS 1d ago