GANs
Generative Adversarial Networks are machine learning algorithms that create or generate something from features learnt in a data set. GANs are deep learning models, composed of convolutional neural networks. The architecture of a GAN is composed of two models, a generator and a discriminator , that compete against each other to compose something. The generator will generate from random noise some data, then this data is fed into the discriminator that classifies it against its learnt data set, determining if the data is real or fake. After the data is rejected by the discriminator, the generator adjust its weights and generates something again. This process is done over and over again until the generated data passes the classification test of the discriminator, creating a generated plausible data. An example of this model would be Dall-E from OpenAI, an algorithm that generates images from user prompts. To generate images, the GAN needs to have its input adjusted to a three dimensional matrix, first the pixel width and height, and then the RGB values of each.
A couple of weeks ago I attended a workshop from the creative computing lab in LCC, about GANs in python. In this workshop we learnt the basics of the GAN structure and with our own data set, created some images with a model. In my case I used a data set of flowers, containing 600 images. It took like 20 minutes to produce an image with 150 epochs (measurement unit utilised in ML to determine one learning cycle), and still the result was pretty ambiguous.

In a GAN, both the generator and discriminator have a determined values, that determined the success of their generator, called the training loss. The following figures shows the training loss process of both models in the algorithm.

All the content of the workshop can be seen in the Google Collab here. In reflection, I have to say that I didn’t have the information needed prior to complete the workshop. I felt a bit lost on the content of it. As you can see in the Google Collab, the code of the GAN is extremely long and complex, making it not very easy to understand. As stated before in this blog, my intention for this portfolio project was to create some experiments with machine learning and audio, but after that workshop I started reconsidering my whole plan. As part of the experiments I had the idea to make an audio GAN that could generate audio files that resemble the ones learnt in the data set, but after realising the complexity of GANs I don’t think that idea is feasible anymore. Also I discussed with the tutor in charge of the workshop about the requirements needed to process audio in a GAN, and apparently I would need a more powerful computer with a greater GPU card that mine. She also said that even if I get access to a better computer, the processing of the audio would be extremely heavy, taking a long time.
Unfortunately I concluded after this workshop that it is not the best idea to focus in machine learning for this project. As I don’t have a lot the required skills and the learning curve is very steep. I would have to spend my time researching and studying, instead of creating and producing, and that’s not very appropriate for a creative project. Also I don’t even know if I’ll manage to learn the subject before the deadline, as it is so vast. I want my project to be more creative and practical, something that develops upon my current skills and that at the same time functions as tool of expressing myself. I still want to include in this new idea, some of the same intentions and influences of my past idea. For example coding, DSP processing, instrument design and algorithmic composition.