DEV Community

Cover image for Speed comparison between Torch + CUDA + xFormers versions and TensorRT vs xFormers for Stable Diffusion XL (SDXL)
Furkan Gözükara
Furkan Gözükara

Posted on

Speed comparison between Torch + CUDA + xFormers versions and TensorRT vs xFormers for Stable Diffusion XL (SDXL)

Full TensorRT Tutorial is here (42 minutes, 32 chapters) : Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive Guide

Speed comparison between Torch + CUDA + xFormers versions and TensorRT vs xFormers for Stable Diffusion XL (SDXL)

I have Automatic1111 SD Web UI to compare

1st image xFormers 23 + Torch 2.1.1 + Cuda 121
2: xFormers 22 + Torch 2.0.1 + Cuda 118
3: Torch 2.1.1 + Cuda 121 + TensorRT
4: Torch 2.0.1 + Cuda 118 + TensorRT

1152x896 pixels with 1.5x high res fix (1728x1344)
30 steps first pass
30 steps high res fix pass
Windows 10
RTX 3090 TI

When TensorRT is used, xFormers is disabled and not used

Full prompt:

Fantasy Forest with Mythical Creatures: A mystical forest filled with mythical creatures, magical trees, sharp focus, intricate, cinematic, full color, and radiant magical lights.
Steps: 30, Sampler: DPM++ 2M SDE Karras, CFG scale: 8, Seed: 1093049346, Size: 1152x896, Model hash: 0724518c6b, Model: juggernautXL_v7Rundiffusion, Denoising strength: 0.7, Hypertile U-Net: True, Hypertile U-Net max depth: 1, Hypertile VAE: True, Hires upscale: 1.5, Hires upscaler: Latent, Version: v1.7.0-RC-5-gf92d6149

Image description

Image description

Image description

Image description

Top comments (0)