Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization

arXiv CS Friday 05 June 2026, 04:00 UTC By Yilong Wang, Qianli Wang, Bohao Chu, Yihong Liu, Jing Yang, Simon Ostermann 1 min read

Key Points

Announce Type: replace Abstract: Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM behavior. Yet extending them beyond English remains challenging: existing methods struggle to produce valid SCEs in non-dominant languages, and a persistent trade-off between validity and minimality undermines...

arXiv:2605.11632v2 Announce Type: replace Abstract: Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM behavior. Yet extending them beyond English remains challenging: existing methods struggle to produce valid SCEs in non-dominant languages, and a persistent trade-off between validity and minimality undermines explanation quality. We introduce Macro, a preference alignment framework that applies Direct Preference Optimization (DPO) to multilingual SCE generation, using a composite scoring function to construct preference pairs that effectively translate the trade-off into measurable preference signals. Experiments across four LLMs and seven typologically diverse languages show that Macro improves validity by 12.55\% on average over the chain-of-thought baseline without degrading minimality, while avoiding the severe minimality violations of the translation-based baseline. Compared to supervised fine-tuning, Macro achieves superior performance on both metrics, confirming that explicit preference optimization is essential for balancing this trade-off. Further analyses reveal that Macro increases cross-lingual perturbation alignment and mitigates common generation errors. Our results highlight preference optimization as a promising direction for enhancing multilingual model explanations.

LLM (ORG) Macro (ORG) SCE (ORG)

Originally published by arXiv CS Read original →

Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization

Related Stories

Bisignano says Social Security Administration's phone helpline wait times have reached a record low

Retired Gen. Kimmitt: Hormuz, Lebanon Are ‘Diversions'

US Treasury Eases Legal Restrictions Across Venezuela Licenses

Stock Investors Eye Deep Run for Japan at Football World Cup