Home Science XCR-Bench: Benchmarking Cross-Cultural Reasoning in LLMs...
Science

XCR-Bench: Benchmarking Cross-Cultural Reasoning in LLMs via Culture-Specific Items and Hall's Triad

Key Points

Announce Type: replace Abstract: Cross-cultural competence in large language models (LLMs) requires understanding and adapting Culture-Specific Items (CSIs) across varying cultural contexts. However, progress in evaluating this capability remains limited by the lack of high-quality CSI-annotated corpora with parallel cross-cultural sentence pairs. We introduce XCR-Bench, a Cross(X)-Cultural Reasoning Benchmark containing 4.1k parallel sentences and 1,098 CSIs across three reasoning tasks.

arXiv:2601.14063v2 Announce Type: replace Abstract: Cross-cultural competence in large language models (LLMs) requires understanding and adapting Culture-Specific Items (CSIs) across varying cultural contexts. However, progress in evaluating this capability remains limited by the lack of high-quality CSI-annotated corpora with parallel cross-cultural sentence pairs. We introduce XCR-Bench, a Cross(X)-Cultural Reasoning Benchmark containing 4.1k parallel sentences and 1,098 CSIs across three reasoning tasks. XCR-Bench integrates Newmark's CSI framework with Hall's Triad of Culture, enabling evaluation across levels of cultural visibility -- from observable practices to implicit social norms and values. Experiments on eight multilingual LLMs show that state-of-the-art models exhibit consistent weaknesses in identifying and adapting specific categories of CSIs, revealing a gap between surface-level recall and explicit cultural reasoning. Performance declines significantly on culturally sensitive categories and deeper cultural levels (p<0.005, 8/8 models), and adaptation quality varies systematically across target cultures and Bengali regional variants, indicating encoded regional and ethno-religious biases even within a single linguistic setting. We publicly release the corpus and code to support future research on cross-cultural NLP.
XCR-Bench (ORG) Benchmarking Cross-Cultural Reasoning (ORG) Hall's Triad (ORG) CSI (ORG) CSIs (ORG) Newmark (ORG) Hall's Triad of Culture (ORG) Bengali (ORG) NLP (ORG)
Originally published by arXiv CS Read original →