Scene Graph Generation
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Visual Commonsense Driven Knowledge Refinements for Scene Graph Generation
Announce Type: new Abstract: Learning-driven Scene Graph Generation (SGG) models excel on frequent relation types but degrade sharply under annotation sparsity, failing to capture reliable visual commonsense knowledge. We propose a model-agnostic, semantically-guided knowledge refinement framework that systematically mines commonsense-grounded constraints from training data - capturing spatial, functional, and qualitative relational regularities - and uses general declarative commonsense...
QPredSGG: Hybrid Quantum Predicate Learning for Long-Tailed Scene Graph Generation
arXiv:2606.04689v1 Announce Type: cross Abstract: Scene Graph Generation (SGG) requires relational reasoning over objects and their interactions, but performance is often limited by severe long-tail predicate imbalance. Classical SGG models frequently rely on dataset statistics, leading to biased predictions toward frequent relations rather than fine-grained semantic predicates. Although existing debiasing strategies improve mean recall, predicate classification in current frameworks still...
RelWitness: Open-Vocabulary 3D Scene Graph Generation with Visual-Geometric Relation Witnesses
arXiv:2605.20823v3 Announce Type: replace Abstract: Open-vocabulary 3D scene graph generation seeks to describe object instances and their relations with flexible natural-language predicates. The central difficulty is not only vocabulary expansion, but supervision reliability: relation annotations in 3D scene graph datasets are selective, and many valid object-pair relations are unannotated. We propose RelWitness, a framework for open-vocabulary 3D scene graph generation from posed RGB-D...
AccioScene: Compositional 3D Scene Generation via Graph Diffusion and Interaction-driven Critics
arXiv:2502.06819v2 Announce Type: replace Abstract: This paper presents a framework for generating 3D indoor scenes from text prompts. Existing methods often formulate scene synthesis as an object layout prediction problem conditioned on a single input modality, such as a text description, room shape, or scene graph. This design can lead to object collisions and limited functional plausibility, reducing its practical applicability.
Decoding the Surgical Scene: A Scoping Review of Scene Graphs in Surgery
arXiv:2509.20941v2 Announce Type: replace Abstract: As surgical AI transitions from pixel-level detection to complex reasoning, Scene Graphs (SGs) offer the structured, relational representations necessary to decode dynamic surgical environments. This PRISMA-ScR-guided scoping review systematically maps the evolving landscape of SG research in surgery, analyzing 52 primary studies to chart applications and methodological shifts. Our analysis reveals rapid growth, yet uncovers a critical...
HDSL: A Hierarchical Domain-Specific Language for Structured 3D Indoor Scene Generation and Localized Editing with LLM Agents
arXiv:2606.09738v1 Announce Type: new Abstract: Text-driven indoor scene generation and editing require an intermediate representation that language models can both produce and revise. Existing LLM-based systems often rely on scene graphs or global constraint lists, which are compact but underspecify local geometry and make instruction-based edits difficult to localize. We frame this problem as structured program generation and local program repair, and propose Hierarchical Descriptive Scene...
Seeing Fast and Slow: Bimodal 3D Scene Graphs for Open-set Tasks
arXiv:2605.31067v2 Announce Type: replace Abstract: Open-set task execution can significantly benefit from seamlessly switching between coarse and fine scene representations depending on the context and the evolving information as the robot explores the environment. For example, it is often sufficient to start with a coarse scene representation initially and only employ a finer, more granular scene representation when the robot encounters regions which are likely to contain the task relevant...
Seeing Fast and Slow: Bimodal 3D Scene Graphs for Open-set Tasks
arXiv:2605.31067v1 Announce Type: new Abstract: Open-set task execution can significantly benefit from seamlessly switching between coarse and fine scene representations depending on the context and the evolving information as the robot explores the environment. For example, it is often sufficient to start with a coarse scene representation initially and only employ a finer, more granular scene representation when the robot encounters regions which are likely to contain the task relevant...
HyperVis: Continuous Latent Visual Relational Graphs on the Lorentz Hyperboloid for Compositional Reasoning
arXiv:2606.06100v1 Announce Type: new Abstract: Vision-Language Models (VLMs) struggle with compositional reasoning that requires understanding inter-object relationships. A natural remedy is to inject explicit scene graph triplets $\langle s, p, o \rangle$ from an off-the-shelf scene graph generator (SGG), but we show this backfires: discrete text labels collide with the continuous visual modality, degrading GQA accuracy from 60.38\% to 58.86\%. We propose \textbf{HyperVis}, which bypasses...
PARSE: Part-Aware Relational Spatial Modeling
Announce Type: replace Abstract: Inter-object relations underpin spatial intelligence, yet existing representations -- linguistic prepositions or object-level scene graphs -- are too coarse to specify which regions actually support, contain, or contact one another, leading to ambiguous and physically inconsistent layouts. To address these ambiguities, a part-level formulation is needed; therefore, we introduce PARSE, a framework that explicitly models how object parts interact to determine...