First Experiment
Before I’m ready to start playing with ML in audio I decided to consolidate my knowledge in digital audio. A computer records and encodes audio, by representing it as a continuos sequence of numbers. The audio gets sampled thousand of times a second, each with a 16 bit-depth. The most common sample rate is 44.100 per second, meaning that an audio file 10 seconds long, has a list of (44.100 * 10) 441.000 numbers. More info here.

I wanted to experiment with these parameters so I downloaded some libraries for DSP manipulation in python. I created some simple scripts in which I decoded wav files and extracted from them the actual list of numbers.

After this I created an array (list) containing all the numbers, then I printed the array’s length and it’s first 10 elements. The length should be equal to the sample rate times the length in seconds of the audio file. So 1.3 secs times 44.100.

This absolutely blew my mind, to realize that the complexity of sound could just be encoded to a list of numbers. Also when you do transformations to this list of numbers you directly manipulate the audio. For example if you reverse the order of the number, you reverse that audio file. Or if you shuffle the order of the numbers in the list, you get white noise.