DALL-E 2: Revolutionizing Image Creation with AI

#ai #machinelearning #openai

“Any sufficiently advanced technology is indistinguishable from magic.” -Arthur C. Clarke.

In this rapidly evolving world of technology, where software like ChatGPT is the current talk of the town, AI or Artificial Intelligence emerges as the most happening and prominent field. I was intrigued to learn about AI and its counterpart, Machine Learning after ChatGPT shook the tech world with its immense capabilities. Set on to explore more, I stumbled across an AI software called DALL-E, which I found pretty interesting. Here’s why -

DALL-E 2 is an AI model from OpenAI that can take simple text descriptions like “a koala dunking a basketball” and turn them into photorealistic images and art that have never existed. In addition, DALL-E 2 can also edit and retouch photos. Based on a simple description, it can fill in or replace part of an image with AI-generated imagery that blends with the original, a process referred to as “in-painting.” It can also extend an image beyond its borders, keeping the theme, colors, and context in the picture commonly called “outpainting.” It can even start with an image as an input and create variations with different perspectives, styles, and mediums.

In January 2021, OpenAI introduced this technology as DALL-E. Its successor, DALL-E 2, further increased its functionality with higher resolution, greater comprehension, and newer capabilities. The textual input is a “prompt” that cues the model to generate something. DALL-E was created by training a neural network on images and their text descriptions, by scraping millions of web pages, and using a version of GPT-3 modified specifically to create images. Through deep learning, it understands individual objects, like penguins and ice creams, and learns from relationships between objects. So, when you ask for a “realistic image of a penguin eating ice cream under the sun,” it knows how to create something with a connection to another object or action. This is what the DALL-E-generated images look like, depicting the same object in different styles while keeping the prompt in focus.

To utilize this tool efficiently, one must learn to generate prompts that yield the desired output, a process called “prompt engineering”. For this, the prompts must be kept descriptive, with the context of the items adequately described. You can also mention styles like ‘pixel art,’ ‘digital art,’ etc., along with the styles of famous artists such as Van Gogh or Picasso, or the medium of the artwork, such as oil pastel or watercolor. For example, when I wanted a new wallpaper for my phone, I asked DALL-E for “a synth-wave style image of a motorcyclist riding on the highway towards the city, digital art.” And the results were exactly what I had pictured in my head.

Pretty cool, huh?

The research behind DALL-E’s technology has three primary outcomes:
First, it can help people express themselves visually in ways they may not have been able to before. It eases the hassle of creating visual ideas, designs, and logos which is often time-consuming. Second, an AI-generated image can tell us much about whether the system understands us or repeats what it has been taught. It shows us if AI constantly learns and improves through prompts and feedback. Third, DALL-E helps humans understand how advanced AI systems see and understand our world. OpenAI aims to give language models a better grasp of the everyday concepts humans use to make sense of things. This is a critical part of developing AI that’s useful and safe.

Though similar image-generating AI software is available, Midjourney is the best alternative available to DALL-E. The difference, however, is that Midjourney prefers creating more images in painted art rather than photorealistic. It also allows us to change the image's aspect ratio but does not allow features such as inpainting. Both are powerful AI that generate images from text, but having worked with both, I use DALL-E more often due to its added functionality.

A primary ethical concern about DALL-E 2 and similar image generation models is that they could be used to generate and propagate deep fakes and similar misinformation. To mitigate this, the software rejects prompts involving public figures and uploads containing human faces. Prompts containing explicit content are blocked, and uploaded images are analyzed to detect offensive material, which however, is not entirely foolproof. Another concern is that such AI could cause technological unemployment for artists, photographers, and graphic designers due to their accuracy and popularity. However, I believe such tools only aid one’s creative endeavors. Art undoubtedly can never be replaced with AI-generated images. Still, the successful artist of the future would be the one who would incorporate such technology in his creations, as it would enable him to do more than what he was capable of before. It can help art companies and businesses increase profits while reducing costs. The rapid progress in AI technology shows no signs of slowing down. With significant companies investing millions in AI software, the future economy will be shaped by such systems and the people that employ them.

Researchers and tech companies are already racing toward the next stage of generative AI art, which will soon hit the market. Meta, for example, has released examples of its text-to-video AI currently in development. Google, on the other hand, has unveiled Dream-Fusion, a text-to-3D AI.

DALL-E 2 may be powerful, but it does have certain limitations. If it’s taught with incorrectly labelled objects, like a truck labelled “car,” and a user tries to generate a car, DALL-E may create…a truck. It’s almost like talking to a person who learned the wrong word for something. Gaps in its training can also limit DALL-E. For example, it may understand what an ocean or a lake is but wrongly interpret a “water body” if it hasn’t learned what it is. Besides the image-generation-related limitations, DALL·E 2 also has inherent biases due to the skewed nature of data collected from the internet. It has gender-biased occupation representations and generates predominantly western features for many prompts. But what's exciting about the approach used to train DALL-E is that it can remember what it learned from various other labelled images and then apply it to a new image. DALL-E exemplifies how imaginative humans and clever systems can work together to make new things – amplifying our creative potential.

DEV Community

DALL-E 2: Revolutionizing Image Creation with AI

Top comments (0)