Commonsense
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Visual Commonsense Driven Knowledge Refinements for Scene Graph Generation
Announce Type: new Abstract: Learning-driven Scene Graph Generation (SGG) models excel on frequent relation types but degrade sharply under annotation sparsity, failing to capture reliable visual commonsense knowledge. We propose a model-agnostic, semantically-guided knowledge refinement framework that systematically mines commonsense-grounded constraints from training data - capturing spatial, functional, and qualitative relational regularities - and uses general declarative commonsense...
Global PIQA: Evaluating Commonsense Reasoning Across 100+ Languages and Cultures
arXiv:2510.24081v2 Announce Type: replace Abstract: To date, there exist almost no culturally-specific evaluation benchmarks for large language models (LLMs) that cover a large number of languages and cultures. In this paper, we present Global PIQA, a participatory commonsense reasoning benchmark for over 100 languages, constructed by hand by over 350 researchers from over 65 countries around the world. The 141 language varieties in Global PIQA cover five continents, 19 language families,...
Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts
arXiv:2606.01441v1 Announce Type: new Abstract: Large language models (LLMs) excel in reasoning and knowledge-intensive tasks but remain vulnerable to prompt-level adversarial attacks that preserve intent while triggering commonsense hallucinations. This vulnerability is urgent, as LLMs are rapidly integrated into safety-critical domains where factual reliability is non-negotiable. Existing attack methods either lack efficiency or fail to capture the adaptive strategies of real-world...
Do Joint Audio-Video Generation Models Understand Physics?
arXiv:2605.07061v2 Announce Type: replace Abstract: Joint audio-video generation models are rapidly approaching professional production quality, raising a central question: do they understand audio-visual physics, or merely generate plausible sounds and frames that violate real-world consistency? We introduce AV-Phys Bench, a benchmark for evaluating physical commonsense in joint audio-video generation. AV-Phys Bench tests models across three scene categories: Steady State, Event Transition,...
Less is MoE: Trimming Experts in Domain-Specialist Language Models
arXiv:2606.05538v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models achieve strong performance through conditional computation, but their large parameter footprint poses deployment challenges. Prior MoE compression approaches catastrophically fail when evaluated on general-purpose benchmarks beyond commonsense reasoning.
nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving
Announce Type: new Abstract: Reasoning is essential for autonomous driving (AD) in long-tail scenarios, where vehicles must apply commonsense knowledge, understand spatial relations, infer agent interactions, and make safe decisions. However, existing AD datasets and benchmarks mainly target perception, prediction, or planning, and provide limited supervision for reasoning over realistic long-tail driving scenes.
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems
arXiv:2605.12213v2 Announce Type: replace Abstract: LLM-based conversational AI agents struggle to maintain coherent behavior over long horizons due to limited context. While RAG-based approaches are increasingly adopted to overcome this limitation by storing interactions in external memory modules and performing retrieval from them, their effectiveness in answering challenging questions (e.g., multi-hop, commonsense) ultimately depends on the agent's ability to reason over the retrieved...
Legal THC driving limit set for NSW a win for medicinal cannabis users
NSW government to introduce commonsense medicinal cannabis driving reforms Thu 4 Jun 2026 at 5:22am In short: Drivers who are prescribed medicinal cannabis and test positive for THC below the maximum threshold, will not face charges. Patients are required to register their prescription with Transport for NSW and complete a driver training course. The reform comes more than a year after the 2024 Drug Summit recommended a medical defence for drivers using prescribed cannabis.
GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation
Announce Type: new Abstract: Articulated object manipulation is a unique challenge for service robots. Existing methods employ end-to-end policy learning, visionmotion planning, and large-language/visual-language model (LLM/VLM), but often overlook the diversity of articulated objects and the complexity of interactions between end-effector and handle, leading to limited generalization and destructive collisions.