To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling

arXiv CS Monday 08 June 2026, 04:00 UTC By Qinyuan Wu, Soumi Das, Mahsa Amani, Arijit Nag, Seungeon Lee, Krishna P. Gummadi, Abhilasha Ravichander, Muhammad Bilal Zafar 1 min read

Key Points

arXiv:2605.00737v2 Announce Type: replace Abstract: Agentic AI architectures augment LLMs with external tools, unlocking strong capabilities. However, tool use is not always beneficial; some calls may be redundant or even harmful. Effective tool use, therefore, hinges on a core LLM decision: whether to call or not call a tool when performing a task. This decision is particularly challenging for web search tools, where the benefits of external information depend on the model's internal knowledge and its ability to integrate potentially noisy tool responses. We introduce a principled framework inspired by decision-making theory to evaluate web search tool-use decisions along three key factors: necessity, utility, and affordability. Our analysis combines two complementary lenses: a normative perspective that infers true need and utility from an optimal allocation of tool calls, and a descriptive perspective that infers the model's self-perceived need and utility from their observed behaviors. We evaluate six open and one closed-source frontier models under two harnesses, one conditioning on only the current turn and its search results, the other on the full execution traces, across four web-search tools and three tasks. In every setting, we find that a model's perceived need and utility are frequently misaligned with the true need and utility. Building on this framework, we train lightweight estimators of need and utility from the models' hidden states. These estimators drive simple controllers that improve decision quality and yield stronger task performance than the self-perceived baseline for most of the open-source models.

Agentic AI (ORG) LLM (ORG)

Originally published by arXiv CS Read original →

To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling

Related Stories

When 'Island Nemo' went missing, locals suspected foul play

Artificial turf contains 400 chemicals tied to cancer and hormone disruption. But is it unsafe?

Japan’s Retail Investor Army Flocks to SpaceX After IPO Drought

NASA addresses criticism over all-male crew selected for Artemis III test mission