Physical Plausibility Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance

arXiv CS Wednesday 03 June 2026, 04:00 UTC By Zhiyuan Hu, Zheng Sun, Yi Wei, Long Yu 1 min read

Key Points

arXiv:2511.10055v2 Announce Type: replace Abstract: The performance of image generation has been significantly improved in recent years. However, the study of image screening is rare, and its performance with Multimodal Large Language Models (MLLMs) is unsatisfactory due to the lack of data and the weak physical plausibility reasoning ability in MLLMs. In this work, we propose a complete solution to address these problems in terms of data and methodology. For data, we collect a comprehensive image screening dataset with over 128k samples, comprising about 640k images. Each sample consists of an original image and four generated images. The dataset evaluates the physical plausibility reasoning ability under four aspects: appearance deformation, physical shadow, placement layout, and extension rationality. Regarding data annotation, we investigate multiple approaches, including purely manual, fully automated, and answer-driven annotations, to acquire high-quality chains of thought (CoT) data in the most cost-effective manner. Methodologically, we introduce a Hard Cases Mining (HCM) strategy with a Dynamic Proportional Accuracy (DPA) reward into the Group Relative Policy Optimization (GRPO) framework, called HCM-GRPO. This enhanced method demonstrates superior physical plausibility reasoning capabilities compared to the original GRPO. Our experimental results reveal that even state-of-the-art closed-source MLLMs, such as GPT5.2 and Gemini3-Pro, exhibit unsatisfactory performance in physical plausibility reasoning. In contrast, by leveraging the HCM-GRPO, we are able to surpass the scores of both large-scale open-source and leading closed-source models with a much smaller model.

HCM-GRPO (ORG) a Dynamic Proportional Accuracy (DPA (ORG) the Group Relative Policy Optimization (ORG) GRPO (ORG) GPT5.2 (LOCATION) Gemini3-Pro (ORG)

Originally published by arXiv CS Read original →

Physical Plausibility Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance

Related Stories

'Voltron: Legendary Defender' turns 10 today, and we think this mecha robot reboot was just as good as 'Power Rangers' and 'Transformers'

Exclusive-GM may ditch LFP batteries for future EVs

Claude Fable won’t answer basic biology questions

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy