Akriti Upadhyay

Posted on Feb 7, 2024

Multi-Person Image Generation Using Stable Diffusion Models on Astria.ai

#ai #coding #api #tutorial

Introduction

One of the most exciting developments in AI is the ability to generate images from text prompts. How would it be if you could generate novel images not only of a single person but of multiple people, together in the same frame? This does sound interesting!

With Astria.ai, this is a simple and quick process. Let’s dive deeper to know more about Astria and how it can help us generate multi-person images.

Astra.ai: Personalized AI for Life-Like Headshots

Astria.ai is a leading AI-powered platform that specializes in image generation and tailored AI solutions. It is designed to simplify and expedite the creation of unique images. Astria offers Dreambooth API for crafting distinct visuals. This API streamlines the fine-tuning process, which eliminates the need for managing GPUs, Python scripts, or adjusting hyperparameters.

Astria can animate concepts, breathing life into narratives without the need for pre-existing footage. This functionality enhances its storytelling potential and elevates the user experience. Astria facilitates image generation through text prompts to empower users to refine their creations effortlessly. Because of the optimal performance and stability, users and app developers can initiate their creative journey within minutes.

Astria offers a comprehensive platform equipped with intuitive tools tailored for easily refining Stable Diffusion models. Its pre-configured features and accessible APIs, such as AI Photoshoot, Product Shots, InPainting, and Masking, along with a user-friendly tuning guide, help streamline the AI image generation process.

A standout aspect of Astria.ai is its extensive API functionality which allows automatic complex workflow over the platform efficiently and cost-effectively. This accessibility empowers app developers to swiftly integrate Astria APIs into their applications by leveraging the advanced capabilities of Stable Diffusion models.

For developers who specialize in social applications, particularly within the photo editing category, this represents a significant opportunity. Users can embed Astria APIs within the mobile app framework and share the unique images with their friends.
There are two types of Stable Diffusion model architectures, one SD15 and another SDXL, on Astria.

Astria allows importing any open-source model such as from CivitAI. However, here are several popular base tune models on Astria:

Realistic Vision 2.0: This model is an improved version of Stable Diffusion 1.5 and excels at generating limitless ultra-realistic images.
runwayml/stable-diffusion-v1-5: It is a latent text-to-image model, which is capable of producing photorealistic images from textual inputs. It was initialized with Stable Diffusion-v1-2 checkpoint weights and fine-tuned on ‘laion-aesthetics v2.5+’ for 595 steps at a resolution of 512x512.
Realistic Vision V5.1 V5.1 (VAE): This is part of the Stable Diffusion. The integration of the Variational Autoencoder enhances the image quality.
Deliberate: This model synergizes with CivitAI’s LoRA weights, which produce stylistic and artistic images.
AnyLoRA: This is a diffuser model that is compatible with CivitAI’s LoRA weights. This model is developed by Lykon from CivitAI.
DreamShaper 8: This model is fine-tuned on runwayml/stable-diffusion-v1-5. It is recognized for its production of high-quality images.

The Prompting Technique for Stable Diffusion Models

Prompting is very important in image generation. Astria provides a ‘Negative Prompt’ field while fine-tuning the images. It also provides a ‘Detailed Description’ field to provide the details of the image. Detailed Description is the place where we do positive prompting, and the things that we don’t want the image to generate are part of the negative prompt.

Here are some prompting tips for when you try this:

Always use parentheses when you want the model to emphasize the specific text. Parentheses increase the weight of the token. However, square brackets de-emphasize the weight of the token.
Choose your keywords very carefully. The right mix of structured, powerful keywords and clear details will generate the exact image you want.
The right words first start with the subject and its attributes. It’s important to describe the visual characteristics consisting of camera angles, lighting, art styles, color schemes, and the surrounding environment.
The prompt should include a description of the image quality you desire from the output.
Negative prompts include concepts that are the exact opposite of positive prompts.

Multi-Person Image Generation through Astria.ai

As Valentine’s Day 2024 is just around the corner, let’s create an image of a happy couple visiting Paris. We’ll enhance it with Astria.ai’s powerful image-generation tools. To get started with multi-person image generation, go to the ‘Tunes’ tab on the page.
Click on ‘New Finetune’.

Now we will fine-tune the man’s image with LoRA and then perform the same with the woman’s image. Finally, we’ll use ControlNet for the final image.

Let’s get started!

Finetuning Man’s Image with LoRA

After clicking on ‘New Finetune’, you’ll move to a new page. Add the desired title. For the man’s image, select the Class name ‘man’. I downloaded a model’s image from Pexels. Now, upload the image. For best fine-tuning, upload more than 4 images of the subject, which should include full body, close-up, and medium-shot photos.

Click on ‘Advanced’ and select the Base to fine-tune the model and the Model type. I selected Realistic Vision V5.1 V5.1 (VAE) and the LoRA model type. For creating multi-person images, you can go either with the LoRA model type or the Checkpoint model type but with the sd15 type model, as mentioned in the documentation.

After that, Create the model. You’ll be redirected to another page, where you’ll get a LoRA ID. Save that; we’ll use it for the final image.

Finetuning Woman’s Image with LoRA

Similarly, for the woman’s image, click on ‘New Finetune’. Give the desired title. For the woman’s image, select the Class name ‘woman’. I downloaded a model’s image from Pexels. Now, upload the image. For best fine-tuning, upload more than 4 images of the subject, which should include full body, close-up, and medium-shot photos.

Click on ‘Advanced’ and select the Base to fine-tune the model and the Model type. I selected Realistic Vision V5.1 V5.1 (VAE) and the LoRA model type.

After that, Create the model. You’ll be moved to another page, where you’ll get a LoRA ID. Save that; we’ll use it for the final image.

Now, let’s fire up our imagination!

Prompting with ControlNet for the Final Images on Astria.ai

When you separately fine-tune the model, you’ll see both of them on the ‘Tunes’ page.

Now, we have to generate an image where both people go to Paris and celebrate their Valentine’s Day.

Click on any of the models. I went with ‘The Woman’ fine-tuned model. In the Detailed Description, I put the following prompt:

(wide shot) of ((ohwx man)) and ((ohwx woman)) standing together in (Paris) 
BREAK ((ohwx man)) wearing a (((grey shirt, white coat and pant))), ((ohwx woman)) wearing a (((black dress and yellow overcoat))), (Paris background), analog style, detailed limbs, detailed face, Amazing Details, Best Quality, Masterpiece, dramatic lighting, highly detailed, analog photo, overglaze, 80mm Sigma f/1.4 or any ZEISS lens, tiled upscale, Cinematic light, ((photorealistic++)), 8k high definition, RAW photo  
BREAK (ohwx man) <lora:982752:1.0> BREAK (ohwx woman) <lora:982753:1.0>

You can see from the prompt how I have defined the subject first; then its attributes are defined, like the clothes they should be wearing and the surroundings that should serve as the backdrop. Then I defined the camera lens, lighting, art style, and the description of the image quality.

After that, the LoRA IDs of both people are mentioned with the ‘man’ and ‘woman’ tokens, so that the model can understand what the LoRA id is for.

‘Ohwx’ is a token used in Stable Diffusion prompts. ‘Ohwx’ is used as an instance token for the naming process during training. It helps in identifying and differentiating the particular style or subject with which this token is used during the training process.

Now comes the Negative prompt, which restricts the model from creating irrelevant content.

For this image, I used the following negative prompt:

old, wrinkles, mole, blemish,(oversmoothed, 3d render) scar, sad, severe, 2d, sketch, painting, digital art, drawing, disfigured, elongated body (deformed iris, deformed pupils, semi-realistic, cgi, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, (extra fingers, mutated hands, poorly drawn hands, poorly drawn face), mutation, deformed, (blurry), dehydrated, bad anatomy, bad proportions, (extra limbs), cloned face, disfigured, gross proportions, (malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, NSFW), nude, underwear, muscular, elongated body, high contrast, airbrushed, blurry, disfigured, cartoon, blurry, dark lighting, low quality, low resolution, cropped, text, caption, signature, clay, kitsch, oversaturated

The negative prompt that is used here is conceptually opposite to what we want from the output image.

Now click on ‘Advanced’ and ‘ControlNet/Img2Img’.

Choose a reference image in the Image URL featuring a couple of your choice, and select the ‘Pose’ from ControlNet Hint. Select the desired number of images and the Aspect Ratio.

After this, click on Create. The model will go in a queue and then it will process the prompt and produce the final image.

The final image, in our case, looks like this:

Now let’s do another prompt where we want the couple to be holidaying in Italy...

wide shot of ohwx man and ohwx woman standing besides lake in Italy
BREAK (ohwx man) wearing (Blue shirt and sunglasses) and (ohwx woman) wearing (pink top, jeans, and sunglasses), (Italy lake background), symmetrical eyes, analog style, detailed face, amazing details, hands in the pocket, masterpiece, intricate details, photorealistic++, high contrast, detailed background, detailed face, ZEISS lens, insanely detailed hair, dramatic lighting, detailed glow, overglaze, best quality, high contrast, tiled upscale, 8k high definition
BREAK (ohwx man) <lora:982752:1.0> BREAK (ohwx woman) <lora:982753:1.0>

With the same negative prompt, advanced, and ControlNet settings, click on ‘Create’. In the ControlNet Image URL, try another image with a background from Italy.

The final image will look like this:

Let’s try one more prompt where the pair visits Switzerland.

full body shot of ohwx man and ohwx woman sitting together in Switzerland
BREAK (ohwx man) wearing (Woollen cap, grey turtleneck, blue jeans, boots) and (ohwx woman) wearing (grey turtleneck, blue jeans, woolen cap, boots), (Switzerland snow background), (snowing background), symmetrical eyes, analog style, detailed face, amazing details, masterpiece, intricate details, photorealistic++, ultra realistic, high contrast, detailed background, ZEISS lens, insanely detailed hair, dramatic lighting, detailed glow, overglaze, best quality, 8k high definition
BREAK (ohwx man) <lora:982752:1.0> BREAK (ohwx woman) <lora:982753:1.0>

With the same negative prompt, advanced, and ControlNet settings, click on ‘Create’. In the ControlNet Image URL, try another image with Switzerland in the background.

The final image will look like this:

We can try a final prompt, where the couple is visiting a beach in the Maldives.

full body shot of ohwx man and ohwx woman standing together near Maldives beach.
BREAK (ohwx man) wearing (white shirt and pant) and (ohwx woman) wearing (white dress), (Maldives beach background), (Palm tree and evening time), ZEISS lens,  symmetrical eyes, analog style, detailed face, amazing details, masterpiece, intricate details, photorealistic++, ultra realistic, high contrast, detailed background, ZEISS lens, insanely detailed hair, dramatic lighting, detailed glow, overglaze, best quality, 8k high definition
BREAK (ohwx man) <lora:982752:1.0> BREAK (ohwx woman) <lora:982753:1.0>

With the same negative prompt, advanced, and ControlNet settings, click on ‘Create’. In the ControlNet Image URL, try another image with a Maldives beach in the background.

The final image will look like this:

Conclusion

With Astria, it is super simple to generate creative multi-person images in multiple locations. All you need is creativity and the right prompts.

To reiterate, the right Stable Diffusion prompts are very important to generate the desired images. There’s no right or wrong prompt. It all depends on your specific requirements. So, go ahead and explore Astria.ai on your own!

This article was originally published here.

DEV Community

Multi-Person Image Generation Using Stable Diffusion Models on Astria.ai

Introduction

Astra.ai: Personalized AI for Life-Like Headshots

The Prompting Technique for Stable Diffusion Models

Multi-Person Image Generation through Astria.ai

Finetuning Man’s Image with LoRA

Finetuning Woman’s Image with LoRA

Prompting with ControlNet for the Final Images on Astria.ai

Conclusion

Top comments (0)