DEV Community

Cover image for Use Hugging Face's ControlNet
Rerun
Rerun

Posted on

Use Hugging Face's ControlNet

Source Code Explore Other Examples

Hugging Face's ControlNet allows to condition Stable Diffusion on various modalities. In this example we condition on edges detected by the Canny edge detector to keep our shape intact while generating an image with Stable Diffusion. This generates a new image based on the text prompt, but that still retains the structure of the control image. We visualize the whole generation process with Rerun.

Logging and visualizing with Rerun

The visualizations in this example were created with the following Rerun code.


Images

rr.log("input/raw", rr.Image(image), timeless=True)
rr.log("input/canny", rr.Image(canny_image), timeless=True)
Enter fullscreen mode Exit fullscreen mode

The input image and control canny_image are marked as timeless and logged in rerun.

Timeless entities belong to all timelines (existing ones, and ones not yet created) and are shown leftmost in the time panel in the viewer. This is useful for entities that aren't part of normal data capture, but set the scene for how they are shown.

This designation ensures their constant availability across all timelines in Rerun, aiding in consistent comparison and documentation.


Prompts

rr.log("positive_prompt", rr.TextDocument(prompt), timeless=True)
rr.log("negative_prompt", rr.TextDocument(negative_prompt), timeless=True)
Enter fullscreen mode Exit fullscreen mode

The positive and negative prompt used for generation is logged to Rerun.


Custom diffusion step callback

We use a custom callback function for ControlNet that logs the output and the latent values at each timestep, which makes it possible for us to view all timesteps of the generation in Rerun.

def controlnet_callback(
    iteration: int, timestep: float, latents: torch.Tensor, pipeline: StableDiffusionXLControlNetPipeline
) -> None:
    rr.set_time_sequence("iteration", iteration)
    rr.set_time_seconds("timestep", timestep)
    rr.log("output", rr.Image(image))
    rr.log("latent", rr.Tensor(latents.squeeze(), dim_names=["channel", "height", "width"]))
Enter fullscreen mode Exit fullscreen mode

Output image

Finally, we log the output image generated by ControlNet.

rr.log("output", rr.Image(images))
Enter fullscreen mode Exit fullscreen mode

Join us on Github

GitHub logo rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

Build time aware visualizations of multimodal data

Use the Rerun SDK (available for C++, Python and Rust) to log data like images, tensors, point clouds, and text. Logs are streamed to the Rerun Viewer for live visualization or to file for later use.

A short taste

import rerun as rr  # pip install rerun-sdk
rr.init("rerun_example_app")

rr.connect()  # Connect to a remote viewer
# rr.spawn()  # Spawn a child process with a viewer and connect
# rr.save("recording.rrd")  # Stream all logs to disk

# Associate subsequent data with 42 on the “frame” timeline
rr.set_time_sequence("frame", 42)

# Log colored 3D points to the entity at `path/to/points`
rr.log("path/to/points", rr.Points3D(positions, colors=colors
Enter fullscreen mode Exit fullscreen mode

Top comments (0)