Hello everyone. I have trained myself recently by using Kohya GUI latest version. Used official SDXL 1.0 base version.
The full DreamBooth training is made the with below config. It trains Text Encoder as well. No captioning used. Only rare token ohwx and class token man used.
The config of training is : https://www.patreon.com/posts/very-best-for-of-89213064
A quick tutorial how to train is here : https://www.youtube.com/watch?v=EEV8RPohsbw
I have used my very best real manually collected regularization images dataset. 5200 images for both man and woman are available with pre-prepared resolutions that you might need.
You can find dataset here : https://www.patreon.com/posts/massive-4k-woman-87700469
Trained with 15 images of myself. You can see my training dataset below. It is at best medium quality deliberately. So that you can easily gather such dataset.
Trained 150 repeating, 1 epoch. Thus total 4500 steps. So at the end training was total 150 epochs. This comes due to logic of Kohya repeating. By watching below video you can understand.
For LoRA training I have used the same config shared publicly in this amazing tutorial > https://youtu.be/sBFGitIvD2A
The LoRA training hyper parameters can be tuned further with more research so there is still a space for improvement.
For LoRA extraction in 1 image you will see I compare effect of FP16, FP32 and BF16 extraction.
For extracting I have used Kohya SS GUI Tool > LoRA extraction
I extracted LoRA from DreamBooth trained model with 128 rank and 128 alpha values. The rank can be research and a better rank and alpha can be found certainly.
The full DreamBooth fine tuning with Text Encoder uses 17 GB VRAM on Windows 10. 4500 steps taking roughly about 2 hours on RTX 3090 GPU.
You can do same training on RunPod which would cost around 0.6 USD since 1 hour RTX 3090 renting price is 0.29 USD.
Alternatively you can do SDXL DreamBooth Kaggle training on a free Kaggle account. However Kaggle quality lower.
Kaggle tutorial with notebook link > https://youtu.be/16-b1AjvyBE
Notebook Link > https://www.patreon.com/posts/kohya-sdxl-lora-88397937
So now time to compare results.
Each image has label of what it is. I am writing prompt full info under them as well. Used same seed.
1st :
closeshot photo of ohwx man wearing an expensive red suit in a debate studio, hd, hdr, 2k, 4k, 8k
Steps: 40, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 2453046211, Size: 1024x1024, Model hash: eef545047f, Model: me15img_150repeat, ADetailer model: face_yolov8n.pt, ADetailer prompt: photo of ohwx man, ADetailer confidence: 0.3, ADetailer dilate erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.5, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer version: 23.11.1, Version: v1.7.0-RC-75-gaeaf1c51
2 :
closeshot photo of ohwx man wearing a fancy golden chainmail armor in a coliseum , hd, hdr, 2k, 4k, 8k
Steps: 40, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 3261301792, Size: 1024x1024, Model hash: eef545047f, Model: me15img_150repeat, ADetailer model: face_yolov8n.pt, ADetailer prompt: photo of ohwx man , ADetailer confidence: 0.3, ADetailer dilate erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.5, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer version: 23.11.1, Version: v1.7.0-RC-75-gaeaf1c51
3:
closeshot photo of ohwx man wearing a police uniform in a magnificent garden , hd, hdr, 2k, 4k, 8k
Negative prompt: cartoon, 3d, anime, lineart, drawing, painting, sketch, blurry, grainy
Steps: 40, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 3562376795, Size: 1024x1024, Model hash: eef545047f, Model: me15img_150repeat, ADetailer model: face_yolov8n.pt, ADetailer prompt: photo of ohwx man , ADetailer confidence: 0.3, ADetailer mask only top k largest: 1, ADetailer dilate erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.5, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer version: 23.11.1, Version: v1.7.0-RC-75-gaeaf1c51
4 :
closeshot photo of ohwx man wearing a general uniform on a battlefield , hd, hdr, 2k, 4k, 8k
Negative prompt: cartoon, 3d, anime, lineart, drawing, painting, sketch, blurry, grainy
Steps: 40, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 2899824500, Size: 1024x1024, Model hash: eef545047f, Model: me15img_150repeat, ADetailer model: face_yolov8n.pt, ADetailer prompt: photo of ohwx man, ADetailer confidence: 0.3, ADetailer mask only top k largest: 1, ADetailer dilate erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.5, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer version: 23.11.1, Version: v1.7.0-RC-75-gaeaf1c51
Comparison of FP32, FP16 and BF16 LoRA extraction from DreamBooth full fine tuned model.
Top comments (0)