DEV Community

Cover image for Understanding How Generative AI Works
Vikas Singh for Brilworks

Posted on

Understanding How Generative AI Works

In the blink of an eye, generative AI tools write articles, generate images, and code. This technology helps professionals and businesses in many ways, from brainstorming to creative creative things.
Today, the presence of AI is quite evident. While we can see glimpses of AI in today’s images, its impact is also becoming noticeable in content creation. According to a report, AI has generated over 15 billion images so far, which would have taken traditional photography 150 years to produce.

As per a report, till August 2023, an estimated 15.47 billion images were created using text-to-image algorithms, surpassing the number of images captured by traditional photography in its first 150 years.

Image description

This is huge. However, we won’t dive into that. Instead, we will talk about how this technology works to generate results that are on par with those of humans.

How does it work? How can it create images or write content in just a few seconds that would take experts days to produce? The answer lies in generative AI.

This so-called "generative AI" combines various technologies and algorithms to generate output. Today’s models work on different principles, meaning they use different techniques to achieve the same. For example, some tools may utilize GANs (Generative Adversarial Networks), while others may leverage Transformers to produce original content.

To put it simply, these programs predict the next word that will come. To achieve this, they use many methods and algorithms, such as GANs in image generation and transformers in content creation.

In this article, we will discuss how generative AI works. But before, let's know what generative AI is.

What is generative AI?

Generative AI refers to the technology that generates different kinds of content, from text and images to audio and videos. It works by predicting the next content in the sequence and uses the same principle for image, and video generation by predicting the next pixel or region in an image.

A generative AI program can leverage different algorithms (or models) and techniques for generating content, although some may have common techniques.

Let's take ChatGPT as an example. It leverages GPT models, short for generative pre-trained transformers. GPT models include an architecture or framework called transformers, one kind of neural network.
Neural networks are what powers today's artificial intelligence world. In simple words, they utilize neural network AI to attain this power. When neural networks are developed and trained in a unique way, they get different names to distinguish them from other architectures.

Let's take the example of CNNs (Convolutional neural networks), introduced in the 1990s but recognized in 2012, which revolutionized computer vision. GANs (Generative adversarial networks), developed by Ian Goodfellow in 2014, have transformed the field of generative AI. Transformers, introduced in a seminal paper – Attention Is All You Need– by Vaswani et al.- have pushed the boundary of neural networks—these transformers power today's popular apps, such as Gemini and ChatGPT.

*How Does Generative AI Work? *
Let's first understand how Generative AI works using a hypothetical case of generating handwritten digits. We'll use a Generative Adversarial Network (GAN) as the example, which has two main components: a discriminator and a generator.

An example of generating handwritten digits with GANs

A GAN is a pair of two neural networks: a generator that takes random noise (input) and a discriminator that tries to distinguish between real images from the dataset and the images generated by the generator. The discriminator, which has real images, tries to differentiate between real and generated (fake) images.

The discriminator learns to classify real images as real (label = 1) and fake images as fake (label = 0).

The generator aims to improve its ability to trick the discriminator, while the discriminator becomes better at telling real from fake. This process continues until the discriminator can't distinguish between the real and generated image.

How Generative Works in Simple Words with Transformers
GPT (generative pre-trained transformers), and Bert (Bidirectional Encoder Representations from Transformers) are a few examples of generative AI tools powered by transformers. Transformers are the backbone of today’s many popular state-of-the-art generative AI tools.

In this example, we will look into how LLMs generate content that seems original using transformers.

Let's understand how an AI tool can create an article titled "Best Exercise Routines for Busy Professionals" by integrating information from documents about exercise, routines, and busy lifestyles.

The AI processes the text, but first, it breaks the text into smaller segments called "tokens." Tokens are the smallest units of text and can be as short as a single character or as long as a word.
For example, "Exercise is important daily" becomes ["Exercise," "is," "important," "daily"].

This segmentation helps the model handle manageable chunks of text and comprehend sentence structures.

Next, LLM AI embedding is used. Each token is converted into a vector (a list of numbers) using embeddings.
If you don’t know what vector embedding is, it is a process of converting text into numbers that hold their meaning and relationship.
Transformers, the technology behind today's most advanced generative AI models, use a sophisticated positional encoding scheme. This "positional encoding" process uniquely represents the position of words in a sequence.

It adds positional information to each word vector, ensuring the model retains the order of words. It also employs an attention mechanism, a process that weighs tokens on their importance.
For example, if the model has read texts about "exercise" and understands its importance for health, and has also read about "busy professionals" needing efficient routines, it will pay "attention" to these connections.

Similarly, if it has read about "routines" that are quick and effective, it can link the concept of "exercise" to "busy professionals."
Now, it connects the information or context from different parts to give a clearer picture of the text's purpose.

So, even if the original texts never specifically mentioned "exercise routines for busy professionals," it will generate relevant information by combining the concepts of "exercise," "routines," and "busy professionals."

This is because it has learned the broader contexts around each of these terms.

After the model has analyzed the input using its attention mechanism and other processing steps, it predicts the likelihood (probability) of each word in its vocabulary being the next word in the sequence of text it's generating. This helps the model decide what word should come next based on what it has learned from the input.

It might determine that after words like "best" and "exercise," the word "routines" is likely to come next. Similarly, it might associate "routines" with the interests of "busy professionals."
Dive deeper into the workings of generative AI by reading our detailed blog here: https://www.brilworks.com/blog/how-generative-ai-works/

Top comments (0)