If you use social media, you may see an image or images generated by machine learning technology recently.
You can use DALLE 2 for free, but you may need to wait for a month maybe more.
Then recently another one has been released. That is Stable Diffusion. It is pretty similar to DALLE 2. If you give text and some parameters, it generates pretty nice image. You can use Stable Diffusion without waiting for a month which is super nice, right? However, it requires a GPU. If you don't have a GPU or cannot access to a GPU probably you 😭 (What am I supposed to do?)
Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database. Similar to Google's Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See this section below and the model card.
Then you can try
stable_diffusion.openvino. You don't need a GPU to run this!!!
Implementation of Text-To-Image generation using Stable Diffusion on Intel CPU.
- Linux, Windows, MacOS
- Python 3.8.+
- CPU compatible with OpenVINO.
pip install -r requirements.txt
Generate image from text description
usage: demo.py [-h] [--model MODEL] [--seed SEED] [--beta-start BETA_START] [--beta-end BETA_END] [--beta-schedule BETA_SCHEDULE] [--num-inference-steps NUM_INFERENCE_STEPS] [--guidance-scale GUIDANCE_SCALE] [--eta ETA] [--tokenizer TOKENIZER] [--prompt PROMPT] [--init-image INIT_IMAGE] [--strength STRENGTH] [--mask MASK] [--output OUTPUT] optional arguments: -h, --help show this help message and exit --model MODEL model name --seed SEED random seed for generating consistent images per prompt --beta-start BETA_START LMSDiscreteScheduler::beta_start --beta-end BETA_END LMSDiscreteScheduler::beta_end --beta-schedule BETA_SCHEDULE LMSDiscreteScheduler::beta_schedule --num-inference-steps NUM_INFERENCE_STEPS num inference steps --guidance-scale GUIDANCE_SCALE guidance scale --eta ETA eta --tokenizer TOKENIZER tokenizer --prompt PROMPT prompt --init-image INIT_IMAGE path to initial image --strength STRENGTH how strong the initial image should be noised [0.0, 1.0] --mask MASK mask of the region to inpaint on the initial image --output OUTPUT output image name
The readme is very straightforward, so probably you won't have any issues to run the
demo.py and try a python script for
However, there might be an issue if you use python already with python version manager and anaconda or etc.
Then, you can use poetry to avoid messing up and keep your python dev env clean.
There are 2 ways to install poetry.
- using pip
- using curl
$ poetry new poetry-stable-diffusion
$ poetry add package_name@package_version
However, you don't need to do this. You can use the following
pyproject.toml I tested already.
In this case, I used python 3.8.12.
If you don't have python 3.8, I highly recommend you to install it with
[tool.poetry] name = "stablediffusion" version = "0.1.0" description = "test Stable Diffusion" authors = ["koji"] [tool.poetry.dependencies] python = "^3.8" numpy = "1.19.5" transformers = "4.16.2" diffusers = "0.2.4" tqdm = "4.64.0" openvino = "2022.1.0" huggingface-hub = "0.9.0" streamlit = "1.12.0" watchdog = "2.1.9" opencv-python = "18.104.22.168" scipy = "1.6.1" [tool.poetry.dev-dependencies] [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api"
What you need to do set up the env is to run one command!
$ poetry install
$ git clone https://github.com/bes-dev/stable_diffusion.openvino.git $ cd stable_diffusion.openvino
$ poetry run python demo.py --prompt "cyberpunk New York City"
The generating process will take a few minutes (in my case it takes around 3 minutes)
my mac spec
$ system_profiler SPHardwareDataType Hardware: Hardware Overview: Model Name: MacBook Pro Model Identifier: MacBookPro16,1 Processor Name: 8-Core Intel Core i9 Processor Speed: 2.3 GHz Number of Processors: 1 Total Number of Cores: 8 L2 Cache (per Core): 256 KB L3 Cache: 16 MB Hyper-Threading Technology: Enabled Memory: 16 GB System Firmware Version: 1922.214.171.124.0 (iBridge: 20.16.365.5.4,0) OS Loader Version: 5126.96.36.199.1~4 Serial Number (system): C02CP2ESMD6Q Hardware UUID: FFCE331E-4543-5DBE-8F98-E329E0A69F91 Provisioning UDID: FFCE331E-4543-5DBE-8F98-E329E0A69F91 Activation Lock Status: Disabled