Scene Graph Generative Models

Last updated on Apr 6, 2025

Beside the individual object perception, scene-level understanding and modelling is also important for capturing holistic information to guide the downstream tasks. Among scene representations, scene graphs offer an effective interface for industrial applications, such as VR/AR, by compactly abstracting scene context. For more information, please click each image below the paragraphs.

Controllable scene synthesis is one of the downstream tasks enabled by scene graphs. In this context, we introduce CommonScenes—the first scene graph generative model for 3D scene generation. CommonScenes integrates a VAE-based layout branch with a diffusion-based shape branch, where the generated shapes populate the generated layouts to synthesize complete 3D scenes.

A follow-up to CommonScenes is EchoScene, a dual-branch diffusion model for scene generation. EchoScene employs information echos to condition the denoising process in each branch, ensuring the generation process is aware of global graph constraints and descriptions.

In addition to VR/AR applications, scene generation can be extended to robotic manipulation. Accordingly, we designed SG-Bot, which uses scene graphs to represent goal states. Based on these goal graphs, SG-Bot generates scenes that serve as fine-grained goal states to guide robotic manipulation.

SG-Bot pipeline — SG-Bot uses scene graph and generated scene as the goal states.

While scene generation based solely on scene graphs ensures semantic compliance, scene graphs alone cannot capture geometric information. To address this limitation, we incorporate easily available image information into graph nodes to create a mixed-modality graph (MMG) that encapsulates multimodal data. MMG can represent five distinct inputs, and we further introduce MMGDreamer to process MMG, achieving geometry-controllable scene generation.

MMGDreamer supports multiple user inputs and transfers them to MMG and then generates scenes.

Beside the scene generation, scene graph can also be manipulated to further manipulate generated scenes. However, such modifications may introduce conflicts that negatively impact the performance of generative models. Additionally, these models require robust edge reasoning when incorporating new nodes into the graph. To address these challenges, we introduce SG-Tailor—a plug-and-play module that seamlessly integrates with existing generative models to enhance scene manipulation.

sg-tailor — SG-Tailor infers reasonable edges for added nodes, and solves conflicts when edges are changed.

Generative AI