Source Code Explore Other Examples
Depth Guided Stable Diffusion enriches the image generation process by incorporating depth information, providing a unique way to control the spatial composition of generated images. This approach allows for more nuanced and layered creations, making it especially useful for scenes requiring a sense of three-dimensionality.
Logging and visualizing with Rerun
The visualizations in this example were created with the Rerun SDK, demonstrating the integration of depth information in the Stable Diffusion image generation process. Here is the code for generating the visualization in Rerun.
Prompt
Visualizing the prompt and negative prompt
rr.log("prompt/text", rr.TextLog(prompt))
rr.log("prompt/text_negative", rr.TextLog(negative_prompt))
Text
Visualizing the text input ids, the text attention mask and the unconditional input ids
rr.log("prompt/text_input/ids", rr.BarChart(text_input_ids))
rr.log("prompt/text_input/attention_mask", rr.BarChart(text_inputs.attention_mask))
rr.log("prompt/uncond_input/ids", rr.Tensor(uncond_input.input_ids))
Text embeddings
Visualizing the text embeddings. The text embeddings are generated in response to the specific prompts used while the unconditional text embeddings represent a neutral or baseline state without specific input conditions.
rr.log("prompt/text_embeddings", rr.Tensor(text_embeddings))
rr.log("prompt/uncond_embeddings", rr.Tensor(uncond_embeddings))
Depth map
Visualizing the pixel values of the depth estimation, estimated depth image, interpolated depth image and normalized depth image
rr.log("depth/input_preprocessed", rr.Tensor(pixel_values))
rr.log("depth/estimated", rr.DepthImage(depth_map))
rr.log("depth/interpolated", rr.DepthImage(depth_map))
rr.log("depth/normalized", rr.DepthImage(depth_map))
Latents
Log the latents, the representation of the images in the format used by the diffusion model.
rr.log("diffusion/latents", rr.Tensor(latents, dim_names=["b", "c", "h", "w"]))
Denoising loop
For each step in the denoising loop we set a time sequence with step and timestep and log the latent model input, noise predictions, latents and image. This make is possible for us to see all denoising steps in the Rerun viewer.
rr.set_time_sequence("step", i)
rr.set_time_sequence("timestep", t)
rr.log("diffusion/latent_model_input", rr.Tensor(latent_model_input))
rr.log("diffusion/noise_pred", rr.Tensor(noise_pred, dim_names=["b", "c", "h", "w"]))
rr.log("diffusion/latents", rr.Tensor(latents, dim_names=["b", "c", "h", "w"]))
rr.log("image/diffused", rr.Image(image))
Diffused image
Finally we log the diffused image generated by the model.
rr.log("image/diffused", rr.Image(image_8))
Join us on Github
rerun-io / rerun
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Build time aware visualizations of multimodal data
Use the Rerun SDK (available for C++, Python and Rust) to log data like images, tensors, point clouds, and text. Logs are streamed to the Rerun Viewer for live visualization or to file for later use.
A short taste
import rerun as rr # pip install rerun-sdk
rr.init("rerun_example_app")
rr.connect() # Connect to a remote viewer
# rr.spawn() # Spawn a child process with a viewer and connect
# rr.save("recording.rrd") # Stream all logs to disk
# Associate subsequent data with 42 on the “frame” timeline
rr.set_time_sequence("frame", 42))
# Log colored 3D points to the entity at `path/to/points`
rr.log("path/to/points", rr.Points3D(positions, colors=colors
…
Top comments (0)