RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation

arXiv CS Tuesday 02 June 2026, 04:00 UTC By Huiqiong Li, Jiayu Wang, Zhiting Mei, Anirudha Majumdar, Jingjing Chen, Bin Zhu 1 min read

Key Points

arXiv:2606.01600v1 Announce Type: new Abstract: Video world models are increasingly used in robotic manipulation, yet existing benchmarks mostly evaluate them under valid, feasible, and safe instructions. We introduce RoboTrustBench, a benchmark for evaluating the trustworthiness of video world models under four scenarios: Normal, Constraint-Sensitive, Counterfactual, and Adversarial. Built from real-world DROID episodes, RoboTrustBench contains 1,207 expert-validated instruction-image pairs and a six-dimensional evaluation protocol with 13 fine-grained criteria. Evaluating seven representative video world models with human and MLLM assessment, we find that current models often generate visually coherent videos, but struggle with constraint reasoning, counterfactual grounding, physical interaction, and unsafe-instruction suppression. These results show that visual quality and surface-level instruction following are insufficient for trustworthy robotic video world modeling.

Adversarial (ORG)

Originally published by arXiv CS Read original →

A sweeping warrantless surveillance authority remains on track to expire Friday, with no clear path to a deal, after President Donald Trump refused this week to abandon his pick of housing official Bill Pulte to temporarily lead the US intelligence community—even tasking Pulte with gutting the Office of the Director of National Intelligence in a DOGE-style “downsizing“ before a permanent director is named. In a Truth Social post after his second White House meeting in two days with House...

Wired 9m ago

Is predictive text giving you mistakes and 'hallucinations'? You're not alone

Predictive text in 'demonstrable decline' with introduction of AI-based language models Thu 11 Jun 2026 at 6:21am The next "butks" stop. Eating a "banns bc a". It's "mi longer shiny sync".

ABC Australia 16m ago

Valve will stop producing physical Steam gift cards because of scammers

Valve will stop producing physical Steam gift cards because of scammers This is why we can’t have nice things. Valve will not be renewing its stock of physical gift cards for Steam. The gaming company anticipates that the cards will be completely gone from retailers' shelves by the end of 2026.

Engadget 20m ago

Oracle Reports Higher-Than-Expected Data Center Spending

Oracle Corp. signage on the floor of the New York Stock Exchange in New York. Photographer: Michael Nagle/Bloomberg

Bloomberg Technology 21m ago

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation

Related Stories

Trump Risks Key Surveillance Authority Over ‘Unqualified’ Spy-Chief Pick

Is predictive text giving you mistakes and 'hallucinations'? You're not alone

Valve will stop producing physical Steam gift cards because of scammers

Oracle Reports Higher-Than-Expected Data Center Spending