DEV Community

Trix Cyrus
Trix Cyrus

Posted on

Part 11: Building Your Own AI - Introduction to Generative Models: GANs and VAEs

Author: Trix Cyrus

[Try My], Waymap Pentesting tool: Click Here
[Follow] TrixSec Github: Click Here
[Join] TrixSec Telegram: Click Here


Generative models are a fascinating area of machine learning, capable of creating entirely new data resembling the training data. In this article, we’ll explore two popular types of generative models: Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). These models have applications in generating realistic images, creating deepfake videos, and even composing music.


1. What are Generative Models?

Generative models aim to understand the underlying data distribution and generate new samples that resemble the original data. Unlike discriminative models, which focus on classification or prediction, generative models create something entirely new.


2. Introduction to GANs

Generative Adversarial Networks (GANs) are a class of generative models introduced by Ian Goodfellow in 2014. GANs consist of two neural networks:

  • Generator: Creates new data resembling the training data.
  • Discriminator: Evaluates whether a sample is real (from the training data) or fake (generated).

These networks are trained in a competitive process, known as adversarial training:

  • The generator tries to fool the discriminator by creating realistic samples.
  • The discriminator tries to correctly classify real and fake samples.

GAN Workflow

  1. The generator produces a sample from random noise.
  2. The discriminator evaluates the sample.
  3. Both networks adjust their weights to improve their respective tasks.

Applications of GANs

  • Generating realistic images (e.g., AI-generated portraits).
  • Creating deepfake videos.
  • Data augmentation for imbalanced datasets.
  • Super-resolution: Enhancing image quality.

3. Introduction to VAEs

Variational Autoencoders (VAEs) are another type of generative model. They work by learning a compressed representation (latent space) of the input data, then generating new data by sampling from this latent space.

How VAEs Work

  1. Encoder: Compresses input data into a latent space representation.
  2. Latent Space: Represents the underlying structure of the data in a reduced form.
  3. Decoder: Reconstructs data from the latent space.

VAEs differ from traditional autoencoders because they incorporate probabilistic sampling, which allows for smoother interpolation in the latent space and the generation of new data.

Applications of VAEs

  • Generating new images or videos.
  • Anomaly detection by comparing reconstructed and original data.
  • Creating music and sound synthesis.

4. Hands-On: Building a GAN for Image Generation

Step 1: Install Required Libraries

pip install tensorflow keras numpy matplotlib
Enter fullscreen mode Exit fullscreen mode

Step 2: Import Libraries

import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
Enter fullscreen mode Exit fullscreen mode

Step 3: Define the GAN Components

# Generator
def build_generator(latent_dim):
    model = tf.keras.Sequential([
        layers.Dense(256, activation='relu', input_dim=latent_dim),
        layers.BatchNormalization(),
        layers.LeakyReLU(0.2),
        layers.Dense(28 * 28 * 1, activation='sigmoid'),
        layers.Reshape((28, 28, 1))
    ])
    return model

# Discriminator
def build_discriminator(input_shape):
    model = tf.keras.Sequential([
        layers.Flatten(input_shape=input_shape),
        layers.Dense(128, activation='relu'),
        layers.LeakyReLU(0.2),
        layers.Dense(1, activation='sigmoid')
    ])
    return model
Enter fullscreen mode Exit fullscreen mode

Step 4: Train the GAN

  • Generate images using the generator.
  • Classify real vs. fake images using the discriminator.
  • Train both networks in an adversarial setup.

5. Hands-On: Building a VAE for Data Generation

Step 1: Define the Encoder and Decoder

# Encoder
latent_dim = 2
encoder_input = layers.Input(shape=(28, 28, 1))
x = layers.Flatten()(encoder_input)
x = layers.Dense(128, activation='relu')(x)
z_mean = layers.Dense(latent_dim, name='z_mean')(x)
z_log_var = layers.Dense(latent_dim, name='z_log_var')(x)

# Decoder
decoder_input = layers.Input(shape=(latent_dim,))
x = layers.Dense(128, activation='relu')(decoder_input)
x = layers.Dense(28 * 28, activation='sigmoid')(x)
decoder_output = layers.Reshape((28, 28, 1))(x)
Enter fullscreen mode Exit fullscreen mode

Step 2: Train the VAE

  • Encode input data into latent space.
  • Sample from latent space using ( z = z_{\text{mean}} + \text{exp}(z_{\text{log_var}}) \cdot \epsilon ).
  • Decode the sampled points to reconstruct the input.

6. Comparison of GANs and VAEs

Feature GANs VAEs
Training Adversarial (generator vs. discriminator) Reconstruction (minimizing loss)
Output Quality Often more realistic Typically smoother, less sharp
Latent Space No explicit latent space Explicit latent space

7. Challenges with Generative Models

  • GANs: Training instability, mode collapse (generator produces limited diversity).
  • VAEs: Blurrier outputs compared to GANs.

8. Future Trends in Generative Models

  • Combining GANs and VAEs for enhanced performance.
  • Advanced architectures like StyleGAN and BigGAN.
  • Applications in drug discovery and creative arts.

~Trixsec

Top comments (0)