The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

arXiv CS Monday 08 June 2026, 04:00 UTC By Xiaoou Liu, Tiejin Chen, Weibo Li, Xiyang Hu, Hua Wei 1 min read

Key Points

arXiv:2606.07017v1 Announce Type: new Abstract: Foundation model agents are increasingly deployed for real-world decision-making, but suffer from the sim-to-real gap. While robotics and classical control have mature frameworks to address this gap, the foundation model community is treating agent robustness as an entirely novel phenomenon. Our paper proposes formalizing the foundation model agent evaluation and training gap as a classical sim-to-real problem structured entirely around the four elements of a Markov Decision Process, including Observation, Action, Transition, and Reward. In this paper, we set a comprehensive research agenda that translates classical discrepancies into the foundation model domain and advocates for adopting established solutions like domain randomization. We provide concrete examples, such as a multilingual tool calling to demonstrate how severe observation space gaps lead to operationally invalid actions despite correct semantic intent. Ultimately, this agenda aims to drive a paradigm shift, yielding a unified vocabulary and standardized stress test benchmarks to foster a new generation of highly trustworthy agents for reliable real-world applications.

Unified MDP Perspective (ORG)

Originally published by arXiv CS Read original →

'Voltron: Legendary Defender' turns 10 today, and we think this mecha robot reboot was just as good as 'Power Rangers' and 'Transformers' Voltron may sound like an ointment for back pain, but the reboot Legendary Defender demonstrates that there's more to the big stompy robots concept than meets the eye. Reboot is a dirty word when it comes to TV. Very rarely does a remade show receive its due.

Space.com 33m ago

Exclusive-GM may ditch LFP batteries for future EVs

Exclusive-GM may ditch LFP batteries for future EVs SAN FRANCISCO, June 10 : General Motors may scrap plans to use a lower-cost, iron-based battery chemistry that many automakers are using to cut electric-vehicle costs, GM's head of battery technology said. The Detroit automaker had said it planned to develop lithium-iron phosphate, or LFP, batteries for use in future EV models, and would begin making those batteries in late 2027 at a jointly owned plant in Tennessee. But GM battery chief...

Channel News Asia 44m ago

Claude Fable won’t answer basic biology questions

Anthropic just released Claude Fable 5, calling it the most powerful AI model it has ever made widely available and praising its skills in biology, among others. But the model won't answer basic biology questions - the kind you'd expect a high schooler to handle. Instead, it hands off the query to the former flagship model, Claude Opus 4.8.

The Verge 49m ago

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy

A SpaceX Falcon 9 rocket launched from Cape Canaveral Space Force Station in Florida.

Bloomberg Technology 50m ago

The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

Related Stories

'Voltron: Legendary Defender' turns 10 today, and we think this mecha robot reboot was just as good as 'Power Rangers' and 'Transformers'

Exclusive-GM may ditch LFP batteries for future EVs

Claude Fable won’t answer basic biology questions

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy