Audio alaysis and Deep Learning
Genre of a song with Deep Learning
General overview:
simplified representation of each song in the library
train a deep neural network to classify the songs
use the classifier to fill in the missing genre
Data
iTunes library
2000 songs
Data preprocessing
too many genres and subgenres, simplified: removing some examples and assigning them to a broader genre
Sampling frequency: 44100 Hz
every second of audio has 44100 values
Tip:
discarding the stereo channel
Use Fourier’s Transform to convert audio data to the frequency domain. Export as a spectrogram. Picture type is PNG file, it contains all the frequencies of song through time.
The 44100 Hz sampling rate allows to reconstruct frequencies up to 22050 Hz. ( Nyquist-Shannon sampling theorem)
Use 50 pixel per second (20ms per pixel) is enough.
Use a spectrogram with 128 frequency levels.
Further processing
deal with the length of the songs:
independent samples representing the genre: create fixed length slices of the spectrogram
cut down the spectrogram into 128x128 pixel slices, each 2.56s
Tips:
we can expand the dataset, add random noise to the images, or slightly stretch them horizontally and then crop them. but we can’t rotate the images, nor flip them horizontally because sounds are not symmetrical.
Model–classifier
sample: songs are square spectral images
algorithm: Deep Convolutional Neural Network to classify these samples
tool: Tensorflow’s wrapper TFLearn
Details:
dataset split: Training (70%), validation (20%), testing (10%)
model: Convolutional neural network.
layers: Kernels of size 2x2 with stride of 2
optimizer: RMSProp.
activation function: ELU (Exponential Linear Unit), because of the performance it has shown when compared to ReLUs
initialization: Xavier for the weights matrices in all layers.
regularization: Dropout with probability 0.5
Result:
2000 songs, 6 genre
12,000 128x128 spectrogram slices
accuracy: 90%
Classify:
slice the new song
put together the predicted classes(voting system)
Recognizing Sounds (A Deep Learning Case Study)
Machine Learning is Fun Part 6: How to do Speech Recognition with Deep Learning
References
https://chatbotslife.com/finding-the-genre-of-a-song-with-deep-learning-da8f59a61194
https://medium.com/@awjuliani/recognizing-sounds-a-deep-learning-case-study-1bc37444d44d
https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-learning-28293c162f7a