Scene Graph Generative Models

Beside the individual object perception, scene-level understanding and modelling is also important for capturing holistic information to guide the downstream tasks. Among scene representations, scene graphs offer an effective interface for industrial applications, such as VR/AR, by compactly abstracting scene context. For more information, please click each image below the paragraphs.

Controllable scene synthesis is one of the downstream tasks enabled by scene graphs. In this context, we introduce CommonScenes—the first scene graph generative model for 3D scene generation. CommonScenes integrates a VAE-based layout branch with a diffusion-based shape branch, where the generated shapes populate the generated layouts to synthesize complete 3D scenes.

CommonScenes
Methodology comparison between CommonScenes and others.
A follow-up to CommonScenes is EchoScene, a dual-branch diffusion model for scene generation. EchoScene employs information echos to condition the denoising process in each branch, ensuring the generation process is aware of global graph constraints and descriptions.
EchoScene
EchoScene: a dual-branch diffusion model conditioned on information echos.
In addition to VR/AR applications, scene generation can be extended to robotic manipulation. Accordingly, we designed SG-Bot, which uses scene graphs to represent goal states. Based on these goal graphs, SG-Bot generates scenes that serve as fine-grained goal states to guide robotic manipulation.
SG-Bot pipeline
SG-Bot uses scene graph and generated scene as the goal states.
While scene generation based solely on scene graphs ensures semantic compliance, scene graphs alone cannot capture geometric information. To address this limitation, we incorporate easily available image information into graph nodes to create a mixed-modality graph (MMG) that encapsulates multimodal data. MMG can represent five distinct inputs, and we further introduce MMGDreamer to process MMG, achieving geometry-controllable scene generation.
MMGDreamer
MMGDreamer supports multiple user inputs and transfers them to MMG and then generates scenes.
Beside the scene generation, scene graph can also be manipulated to further manipulate generated scenes. However, such modifications may introduce conflicts that negatively impact the performance of generative models. Additionally, these models require robust edge reasoning when incorporating new nodes into the graph. To address these challenges, we introduce SG-Tailor—a plug-and-play module that seamlessly integrates with existing generative models to enhance scene manipulation.
sg-tailor
SG-Tailor infers reasonable edges for added nodes, and solves conflicts when edges are changed.