Nakatox

Posted on Aug 28, 2022

Create text-to-image generation (for free)

#ai #generation #texttoimage #art

Hi 😄
It's been a while since company like google was trying to generate 3d or 2d images with AI. But for the last few years, this went out of control, and that incredible for many cases like art, design, games, etc.

But I place my interest on the text-to-image generation (There are many other technologies that seem to be the same but for different usage.). It is pretty explicit, you write a text that describe a situation, a place, or anything with many details if you desire a very specific thing or not if you don't. And then a image shows up with what you wrote.

Pretty simple right ? This tool unlocks new ways for artists to produce sketches of what they want to paint/draw for example. They have an idea and generate an image to get inspiration. But some are specialized with this tool only. They sell it like that.

It can be utilized for pretty much everything that required imagination, I think the most efficient way to use it is to helps you to create what you want.

This kind of generation is very controversy in the digital artist community. Some find it "too easy" because you only need writing text on an input (you will see that it is more complicated than that).

Everybody have their definition of art, that will remain a controversial theme forever.

Many companies are starting going public on their API, discord server or website to generate your image on their platform.
Few examples :

MidJourney
Dalle 2
Stable Diffusion

But in every case, you will have to pay a premium plan or buy some credits to make your generation.

So, how to make my own ?

I'm going showing you what I personally use.

There are more and more steps at each level (there are three.), but you have more and more personalisation.

Hugging face spaces and other platform / plug and play :

maintenance for now
VQgan

Fast / not really good
Latent Diffusion

around 1 or 2 min / very nice / real
Dalle mini

around 3min, dependaing on the demand / the most impressive one / between artistic and real
stable diffusion

Google codelabs / easy to use but few steps :

These are more tricky to setup, you will need to connect your google account in order to get GPU and CPU from google. You can pay a premium plan to have more of each and a better availability.
They have all the same pattern : "Run" all prerequisites by pressing each play button, wait that they finished and pass to the next one until you see a "Prompt" section where you will need to enter your text input. Then continue steps.
If you want to go further in the personalisation, you can expand every step in the collabs page and try to understand what's going on. There you will be able to touch some parameters.

Very good one, between artistic and real
Disco diffusion
Prompt Parrot
Stable diffusion Lite

VQGan
VQGAN+CLIP

Pixel art
Pixel art Difusion

Very very easy to use but not really advanced
Simple Stable

If you want to have the choice between a large panel diffusion (this contains paid plan and/or limited use)

List of Stable Diffusion systems

Vision of chaos software / hard to setup and to understand (but once done, easy)

Ok now we enter in the dark part, where you will actually generate with YOUR PC the text-to-image prompt.
Before going further, you need to understand that it will cost big resources to your PC and it will requirer at least a An NVIDIA 20xx GPU with 8GB VRAM if you want to have some result.
When you have downloaded the software and prerequisites, you will need probably about 120Go of storage (yes, that's a lot, but there is many many models and you will have a lot to pay with)
The tool we are using is actually pretty cool and can do a LOT of stuff, once you installed it, feel free to explore it.

So, we are going to use the text-to-image part of Vision of chaos, it use your own component to generate the image locally on your pc, wich mean unlimited free prompt for you (energy cost excluded of course)

Steps :

download the software => https://softology.pro/voc.htm#identifier
follow the full detailed tutorial to have all prerequisites here => https://softology.pro/tutorials/tensorflow/tensorflow.htm
follow this usecase to understand how to use the tool => https://www.youtube.com/watch?v=4_LgrAL7EWg

If everything is done, I let you test whatever you want, play with parameters and try every models that you have downloaded.
If you want to reduce the storage, you simply remove from the models folder location of Vision of Chaos, the ones that you don't use/like.

Interesting usecase

These are from twitter account that have been trying to applicate the technology to application like Figma.
Figma + stable diffusion
Gimp + stable diffusion

And that's it for now, that was a very short preview of how you can make your own generation.
Hope you like it !
Bye :)

DEV Community

Create text-to-image generation (for free)

Hugging face spaces and other platform / plug and play :

Google codelabs / easy to use but few steps :

If you want to have the choice between a large panel diffusion (this contains paid plan and/or limited use)

Vision of chaos software / hard to setup and to understand (but once done, easy)

Interesting usecase

Top comments (0)

Read next

Hacker News Coze Plugin

Prompt engineering techniques: structures and templates

Welcome to the Advances in Artificial Intelligence and Machine Learning journal

KitOps: The Bridge Between AI/ML Models and DevOps