GANs (Generative Adversarial Networks), were first proposed in 2014 by Ian Goodfelllow and others (including Yoshua Bengio), while they were still students at the University of Montreal. In 2016, artificial intelligence research director on Facebook and professor at New York University Yann LeCun, called them the most interesting idea of ​​the last decade of machine learning.

Fig. 1. Generating photos of a bedroom from the LSUN base. On the left side: photos generated from the DCGAN base. On the right side:photos generated by the EBGAN-PT base.

To fully understand what GANs are, they should be compared with discriminative algorithms. Those algorithms could be simple and deep DNN neural networks (depending on the task specification).

For example, let’s use the problem of predicting whether a given email is spam or not. All the words in the email are variables that indicate one of the following labels: spam, not spam. A discriminative algorithm based on the input data vector (the words occurring in a given message are converted into a mathematical representation) learns to predict how much of the email is spam, i.e. the discriminator’s output is the probability of the input data being spam, so it learns the relationship between the input and the output.

GANs work exactly the opposite. Instead of predicting what the input data represents, they try to predict data with a label. They try to answer the following question more specifically: Assuming that this email is spam, how probable is the data?

To be more precise, the task of Generative Adversarial Networks is to solve the problem of generative modeling, which can be done in two ways (we always need high resolution data, e.g. images or sound). The first option is to estimate the density – having a lot of examples, we want to find the probability of dense function which describes them. The second approach is to create an algorithm that will learn to generate data from the same set of training data (this is not about creating the same information but new information that could possibly be it).

What is GAN’s approach to generative modeling?

This approach can be compared to the game in which two agents play. One of them is a generator that tries to create data, the other is a discriminator that predicts whether this data is true or not. The purpose of the generator is to cheat the other player, so when both are getting better and better in their task overtime, he is forced to generate data that is most likely to be used for training data.

What is the learning process?

The first agent – discriminator (it is a differentiable function D, usually a neural network) gets one of the training data (eg a face photo) to the input. We call this photo x (name of the variable) and its purpose is that D(x) is as close as possible to 1 – which means that x is a real example.

The second agent – the generator (which also has to be a differentiable G function, it is also usually a neural network) receives white noise z to the input (random values ​​that allow him to generate different, probable photos). Then using the G function to function z we get x (in other words G(z) = x). We hope that sample x will be quite similar to the original training data, but it may have some kind of problems such as noticeable noisiness, which may lead to recognizing this sample as a false example by the discriminator. The next step is to apply the discriminative function D to the fake sample x from the generator. Now the goal of D is to make D(G(z)) as close to 0 as possible, while G should make D(G(z)) close to 1.

This can be compared to the example of counterfeiters and police. The police want the public to be able to use real banknotes without the possibility of being cheated and they also want to detect these fake ones, and then remove them from circulation and finally punish criminals. At the same time, counterfeiters want to cheat the police and use the money they have created. As a result – both environments, policemen and thieves, learn to do their job better and better.

What are the uses of GANs?

These networks are primarily used for operations on images. This is not their only use, because you can use them for any kind of data.

Fig. 2. Style Transfer by CycleGAN

For example, the DiscoGAN network can transfer a style or a pattern from one field (eg a purse) to another (eg a shoe) or generate probable image from an object sketch (there are many networks that can do it too, one is Pix2Pix). This is one of the more frequent applications of GANs, the so-called Style Transfer. Another GAN application is the CycleGAN network, which can translate an ordinary image into a Van Gogh or Monet look alike image. They also allow to generate images from the description (StackGAN network) and can help us with increasing the image resolution (SRGAN network).

Useful sources: