How you can generate photos with enhanced depth notion utilizing Depth Guided Secure Diffusion and Rerun
This tutorial is a information centered on visualisation and gives full code for visualising the picture technology of Depth Guided Secure Diffusion with the open-source visualisation software Rerun.
Depth Guided Secure Diffusion enriches the picture technology course of by incorporating depth info, offering a singular approach to management the spatial composition of generated photos. This method permits for extra nuanced and layered creations, making it particularly helpful for scenes requiring a way of three-dimensionality.
The visualizations on this instance had been created with the Rerun SDK, demonstrating the combination of depth info within the Secure Diffusion picture technology course of. Right here is the code for producing the visualization in Rerun.
Visualizing the immediate and unfavorable immediate
rr.log("immediate/textual content", rr.TextLog(immediate))
rr.log("immediate/text_negative", rr.TextLog(negative_prompt))
Visualizing the textual content enter ids, the textual content consideration masks and the unconditional enter ids
rr.log("immediate/text_input/ids", rr.BarChart(text_input_ids))
rr.log("immediate/text_input/attention_mask", rr.BarChart(text_inputs.attention_mask))
rr.log("immediate/uncond_input/ids", rr.Tensor(uncond_input.input_ids))
Visualizing the textual content embeddings. The textual content embeddings are generated in response to the particular prompts used whereas the unconditional textual content embeddings signify a impartial or baseline state with out particular enter circumstances.
rr.log("immediate/text_embeddings", rr.Tensor(text_embeddings))
rr.log("immediate/uncond_embeddings", rr.Tensor(uncond_embeddings))
Visualizing the pixel values of the depth estimation, estimated depth picture, interpolated depth picture and normalized depth picture
rr.log("depth/input_preprocessed", rr.Tensor(pixel_values))
rr.log("depth/estimated", rr.DepthImage(depth_map))
rr.log("depth/interpolated", rr.DepthImage(depth_map))
rr.log("depth/normalized", rr.DepthImage(depth_map))
Log the latents, the illustration of the pictures within the format utilized by the diffusion mannequin.
rr.log("diffusion/latents", rr.Tensor(latents, dim_names=["b", "c", "h", "w"]))
For every step within the denoising loop we set a time sequence with step and timestep and log the latent mannequin enter, noise predictions, latents and picture. This make is feasible for us to see all denoising steps within the Rerun viewer.
rr.set_time_sequence("step", i)
rr.set_time_sequence("timestep", t)
rr.log("diffusion/latent_model_input", rr.Tensor(latent_model_input))
rr.log("diffusion/noise_pred", rr.Tensor(noise_pred, dim_names=["b", "c", "h", "w"]))
rr.log("diffusion/latents", rr.Tensor(latents, dim_names=["b", "c", "h", "w"]))
rr.log("picture/subtle", rr.Picture(picture))
Lastly we log the subtle picture generated by the mannequin.
rr.log("picture/subtle", rr.Picture(image_8))