I’ve been reading about Generative Adversarial Networks (GANs) which were first introduced in this 2014 paper by Ian Goodfellow et al. At the time, deep neural nets had already shown impressive advances in object identification, visual discrimination, NLP, and other tasks–especially in the context of supervised learning. However, GANs addressed a distinct shortcoming in neural networks–they were less capable of generatingbelievable images (or other types of media such as audio).
Building on previous work in game theory, restricted Boltzmann machines (RBM’s), deep belief networks (DBN’s), and deep Boltzmann machines (DBM’s), Goodfellow and his coauthors proposed using pairing two neural networks to make each other better at their respective tasks. One network was called G, the generator, while the second network was called D, the discriminator. In addition there was a library L of real images that were similar to the types of images created by G. The job of D was to tell if an image was drawn from G or L. Since both G and D are multilayer perceptrons, they can jointly be trained using the backpropagation, stochastic gradient descent, and dropout algorithms.
The initial 2014 paper showed promising results and has generated a wave of papers since then. I’ve found this three-part series on GANs by Zak Jost to be a helpful introduction to post-2014 developments. GANs have proven so popular that there are now hundreds of related GANs tracked at places like GANzoo.
There are several more GAN papers on my reading list. First, the 2015 LAPGAN paper by Denton et al which is a GAN that starts by generating a low-resolution image and uses a Laplacian pyramid to generate additional images with greater detail and increasing resolution. Second, the 2016 DCGAN paper on Deep Convolutional GANs by Radford, Metz, and Chintala. Finally, the InfoGAN paper by Chen, Duan, et al. which provides a more structured understanding of the how to precisely manipulate images building on the accuracy introduced by LAPGANs.