DEV Community

nponte-zumo
nponte-zumo

Posted on

What Are GANs? Generative Adversarial Networks Explained

Generative Adversarial Networks, or GANs, are an implementation of a Generative Model [Figure 1]. A generative model is a neural network which is trained to output a new example (e.g. an image) from a distribution of provided examples.

Figure 1

GANs are composed of two parts: the Generator and the Discriminator [Figure 2]. The Generator and the Discriminator are two separate neural networks: their goal is to transform a given input into a desired output.

Figure 2

The Generator is a generative model: given a random vector of noise this network will generate something which is within the distribution of the training data. The Discriminator is a model trained to return the probability that the input example came from the original training data—that is, is from the same statistical distribution—and was not just randomly generated [Figure 3]. Think of the Generator as an art forger and the discriminator as an art evaluator.

Figure 3

Both of these models are trained to achieve that same end goal by using back propagation. Back propagation is the process of iteratively moving the weights of the neural network towards the desired goal [Figure 4]. The Generator and the Discriminator (borrowing some concepts from game theory) are placed as adversaries, allowing them to train each other to be better at their respective tasks: generating and discriminating. Though trained in tandem, the ultimate goal is really to train a strong generative model.

Figure 4

Important Literature

If you're just getting into GANs, I suggest starting with Generative Adversarial Nets [1] which was the first paper to propose the GAN architecture. This paper introduces the foundational math and basic structure that allows this adversarial system to function.

Most “modern” GANs have similar architectures to the one proposed in the paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [2] or DCGANs. This paper provides guidance on using Convolutional Neural Networks or CNNs in a GAN framework. Using this new architecture, they are able to identify parts of the model state or weights it is using for certain representations. The research team was able to empirically identify what parts of the model was encoding the logic about windows in a room generator and then use that knowledge to determine if the network was learning a true internal representation of what makes a room.

Alt Text

Another paper, titled Improved Techniques for Training GANs [3], suggests a series of concepts that aim to navigate common pitfalls and train more robust Generator and Discriminator models. This paper uses these techniques to demonstrate significant improvements over existing models for both simple and complex representations.

Alt Text

Additionally, one of the more impressive GAN implementations belongs to the authors of A Style-Based Generator Architecture for Generative Adversarial Networks [4]. This last paper borrows style-transfer techniques to train better Generators, specifically Generators which separate high-level attributes from variations. This technique allows us to generate examples that are influenced by a specific style.

Alt Text

Why the Data Matters

Finally—since we're Zumo Labs—I'll leave you with some cool examples of how GANs can be used to tackle the data problem that's at the very heart of machine learning.

One of the core issues with most collected data is that most machine learning models used in production are trained using supervised learning, where a label is necessary for the model to learn from an image. GANs have been tackling this problem by learning how to generate labels from an unlabeled input image. The images below are some examples from Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks [5] and RenderGAN: Generating Realistic Labeled Data [6].

Alt Text

At Zumo Labs, we think the future of machine learning involves mapping configurable simulations to real world use cases. However, in some cases, it's currently more expensive to model all of these scenarios than it is to go take images of these scenarios and manually annotate them. In order to decrease the cost of these virtual worlds, machine learning can be used to learn a mapping between a configurable environment and the real world. This can be seen below in the paper GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds [7] where a GAN is used to generate 3D photo-realistic worlds from Minecraft worlds.

Alt Text

References

[1] Generative Adversarial Networks (https://arxiv.org/pdf/1406.2661.pdf)
[2] Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (https://arxiv.org/pdf/1511.06434.pdf)
[3] Improved Techniques for Training GANs (https://arxiv.org/pdf/1606.03498.pdf)
[4] A Style-Based Generator Architecture for Generative Adversarial Networks (https://arxiv.org/pdf/1812.04948.pdf)
[5] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (https://arxiv.org/pdf/1703.10593.pdf)
[6] RenderGAN: Generating Realistic Labeled Data (https://arxiv.org/pdf/1611.01331.pdf)
[7] GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds (https://arxiv.org/pdf/2104.07659.pdf)

Discussion (0)