TQA-Bench: Evaluating LLMs for Multi-Table Question Answering

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Zipeng Qiu, Chenyue Li, You Peng, Guangxin He, Binhang Yuan, Chen Wang 1 min read

Key Points

Announce Type: replace Abstract: The advance of large language models (LLMs) has unlocked great opportunities in complex multi-modal data management tasks, particularly in question answering (QA) over complicated multi-table relational data. Despite significant progress, systematically evaluating LLMs on multi-table QA remains a critical challenge due to the inherent complexity of analyzing the modality of relational data structures and the potentially large scale of serialized tabular data....

arXiv:2411.19504v2 Announce Type: replace Abstract: The advance of large language models (LLMs) has unlocked great opportunities in complex multi-modal data management tasks, particularly in question answering (QA) over complicated multi-table relational data. Despite significant progress, systematically evaluating LLMs on multi-table QA remains a critical challenge due to the inherent complexity of analyzing the modality of relational data structures and the potentially large scale of serialized tabular data. Existing benchmarks primarily focus on single-table QA, failing to capture the intricacies of connections across multiple relational tables, as required in real-world domains such as finance, healthcare, and e-commerce. We present TQA-Bench, a long-context analytical multi-table QA benchmark derived from real-world public datasets, with a flexible sampling mechanism that varies context length (8K--64K tokens) and symbolic extensions for assessing reasoning beyond retrieval and pattern matching. We systematically evaluate a set of LLMs spanning model scales from 2 billion to 671 billion parameters. Our extensive experiments reveal critical insights into the performance of LLMs in multi-table QA, highlighting both challenges and opportunities for advancing their application in complex, data-driven environments.

TQA-Bench (ORG) Multi-Table Question Answering arXiv:2411.19504v2 (ORG) healthcare (ORG)

Originally published by arXiv CS Read original →

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering

Related Stories

NT attorney-general backs community safety changes despite unrest

Colombian lawmakers seek suspension of Trump foe Gustavo Petro over alleged meddling in upcoming election

‘I love the inflation’: Trump’s messaging on affordability clashes with voter concerns

Arbroath and Broughty Ferry: A tale of two high streets