When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

arXiv CS Friday 05 June 2026, 04:00 UTC By Parth Darshan, Abhishek Divekar 1 min read

Key Points

Announce Type: replace Abstract: Customizing an LLM judge to a specific problem or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolkit of multi-task learning (PCGrad, MGDA) does not apply to this multi-objective textual gradient setting.

arXiv:2605.26046v2 Announce Type: replace Abstract: Customizing an LLM judge to a specific problem or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolkit of multi-task learning (PCGrad, MGDA) does not apply to this multi-objective textual gradient setting. We extend TextGrad to the multi-objective setting and test four decomposition modes of textual gradient optimizers by varying how much cross-objective information the loss, gradient and optimizer LLMs share. We find the gradient's task-focus drops by 59% (9.0 to 3.7 out of 10) when the gradient LLM must provide feedback on multiple criteria jointly. Separately, we observe that naively combining single-objective optimized instructions into a single prompt degrades Spearman rho from 0.305 to 0.220 (-0.085). These results identify two separable failure modes: optimization-time gradient dilution and inference-time instruction interference, which together constrain the design space for multi-objective judge optimization using textual feedback.

LLM (ORG) PCGrad (ORG) MGDA (ORG) TextGrad (PERSON) Spearman rho (ORG)

Originally published by arXiv CS Read original →

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Related Stories

When 'Island Nemo' went missing, locals suspected foul play

Organic foods are not healthier or pesticide free

Artificial turf contains 400 chemicals tied to cancer and hormone disruption. But is it unsafe?

Japan’s Retail Investor Army Flocks to SpaceX After IPO Drought