Generative Adversarial Networks

GANs, i.e. Generative Adversarial Networks, were first proposed by University of Montreal students Ian Goodfellow and others (including Yoshua Bengio) in 2014. In 2016, Facebook’s AI research director and New York University professor Yann LeCun called them “the most interesting idea in the last 10 years in machine learning”.

In order to understand what GANs are, it is necessary to compare them with discriminative algorithms like the simple Deep Neural Networks (DNNs). For an introduction to neural networks, please see this article. For more information on Convolutional Neural Networks, click here.

Let us use the issue of predicting whether a given email is spam or not as an example. The words that make up the body of the email are variables that determine one of two labels: “spam” and “non-spam”. The discriminator algorithm learns from the input vector (the words occurring in a given message are converted into a mathematical representation) to predict how much of a spam message the given email is, i.e. the output of the discriminator is the probability of the input data being spam, so it learns the relationship between the input and the output.

GANs do the exact opposite. Instead of predicting what the input data represents, they try to predict the data while having a label. More specifically, they are trying to answer the following question: assuming this email is spam, how likely is this data?

Even more precisely, the task of Generative Adversarial Networks is to solve the issue of generative modelling, which can be done in 2 ways (you always need high-resolution data, e.g. images or sound). The first possibility is density estimation — with access to numerous examples, you want to find the density probability function that describes them. The second approach is to create an algorithm that learns to generate data from the same training dataset (this is not about re-creating the same information but rather creating new information that could be such data).

What generative modelling approach do GANs use?

This approach can be likened to a game played by two agents. One is a generator that attempts to create data. The other is a discriminator that predicts whether this data is true or not. The generator’s goal is to cheat the other player. So, over time, as both get better at their task, it is forced to generate data that is as similar as possible to the training data.

What does the learning process look like?

The first agent, i.e. the discriminator (it is some differentiable function D, usually a neural network), gets a piece of the training data as input (e.g. a photo of a face). This picture is then called  (it is simply the name of the model input) and the goal is for D(x) to be as close to 1 as possible — meaning that x is a true example.

The second agent, i.e. the generator (differentiable function G; it is usually a neural network as well), receives white noise z (random values that allow it to generate a variety of plausible images) as input. Then, applying the function G to the noise z, one obtains x (in other words, G(z) = x). We hope that sample x will be quite similar to the original training data but will have some problems — such as noticeable noise — that may allow the discriminator to recognise it as a fake example. The next step is to apply the discriminant function D to the fake sample x from the generator. At this point, the goal of D is to make D(G(z)) as close to zero as possible, whereas the goal of G is for D(G(z)) to be close to one.

This is akin to the struggle between money counterfeiters and the police. The police want the public to be able to use real banknotes without the possibility of being cheated, as well as to detect counterfeit ones and remove them from circulation, and punish the criminals. At the same time, counterfeiters want to fool the police and use the money they have created. Consequently, both the police and the criminals are learning to do their jobs better and better.

Assuming that the hypothetical capabilities of the police and the counterfeiters — the discriminator and the generator — are unlimited, then the equilibrium point of this game is as follows: the generator has learned to produce perfect fake data that is indistinguishable from real data, and as such, the discriminator’s score is always 0.5 — it cannot tell if a sample is true or not.

What are the uses of GANs?

GANs are used extensively in image-related operations. This is not their only application, however, as they can be used for any type of data.

Style Transfer by CycleGAN
Figure 1 Style Transfer carried out by CycleGAN

For example, the DiscoGAN network can transfer a style or design from one domain to another (e.g. transform a handbag design into a shoe design). It can also generate a plausible image from an item’s sketch (many other networks can do this, too, e.g. Pix2Pix). Known as Style Transfer, this is one of the more common uses of GANs. Other examples of this application include the CycleGAN network, which can transform an ordinary photograph into a painting reminiscent of artworks by Van Gogh, Monet, etc. GANs also enable the generation of images based on a description (StackGAN network) and can even be used to enhance image resolution (SRGAN network).

Useful resources

[1] Goodfellow I., Improved Techniques for Training GANs, https://arxiv.org/abs/1606.03498
2016, https://arxiv.org/pdf/1609.04468.pdf

[2] Chintala S., How to train a GAN, https://github.com/soumith/ganhacks

[3] White T., Sampling Generative Networks, School of Design, Victoria University of Wellington, Wellington

[4] LeCun Y., Mathieu M., Zhao J., Energy-based Generative Adversarial Networks, Department of Computer Science, New York University, Facebook Artificial Intelligence Research, 2016, https://arxiv.org/pdf/1609.03126v2.pdf

References

[1] Goodfellow I., Tutorial: Generative Adversarial Networks [online], “NIPS”, 2016, https://arxiv.org/pdf/1701.00160.pdf
[2] Skymind, A Beginner’s Guide to Generative Adversarial Networks (GANs) [online], San Francisco, Skymind, accessed on: 31 May 2019
[3] Goodfellow, Ian, Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron, and Bengio, Yoshua. Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680, 2014
[4] LeCun, Y., What are some recent and potentially upcoming breakthroughs in deep learning?, “Quora”, 2016, accessed on: 31 May 2019, https://www.quora.com/What-are-some-recent-and-potentially-upcoming-breakthroughs-in-deep-learning
[5] Kim T., DiscoGAN in PyTorch, accessed on: 31 May 2019, https://github.com/carpedm20/DiscoGAN-pytorch