A Convolutional Approach to Melody Line Identification in Symbolic Scores



In this section of the website, we try to understand how the Convolutional Neural Network is working under the hood.

Sound files

First of all, we propose to listen to the output of our model and to the ground truth. These are two examples: in the first one the model trained on the Mozart dataset manages to detect almost all the melody notes, in second one, instead, it fails. It is nice to note how the introduction is predicted correctly.

Gluck - Die Sommernacht

The original one:


The prediction made by the model trained on the Mozart dataset:


Liszt - Die Glocken Von Marling

The original one:


The prediction made by the model trained on the Mozart dataset:


Saliency with Guided Backpropagation

As first attempt to inspect the network, we computed the saliency map as described in this tutorial, following the Springenberg et al. (2015) method.

These are some of the images we created. We also computed the histogram of the predicted values greater than zero and we compared it to the distribution of the ground truth. We also plotted the value of the computed threshold to visualize the distribution of the values.

Gluck - Die Sommernacht gluck saliency

gluck distribution

Liszt - Die Glocken Von Marling liszt saliency

liszt distribution

Albeniz - Tango albeniz saliency

albeniz distribution

Schubert - Ave Maria (excerpt) schubert saliency

schubert distribution

Mozart - KV475 - first movement (excerpt) mozart saliency

mozart distribution

Proposed saliency

Since we were not able to grasp meaningful information from the previous saliency, we implemented an ad hoc method to inspect the CNN. This method can be used to query any neural network where the input is of the same size as the output. With this inspection method, we can ask to the network why a certain region was predicted in a certain way. In these images, the green rectangle is the subject of the query. The top image shows the input pianoroll, the middle image shows the CNN prediction, while the bottom image shows which pixel have contributed to that prediction (positive values) and which have instead hindered that prediction (negative values). The analyzed excerpt is the first window (two quarters) of Die sommernacht by Gluck.

Gluck masked saliency 1

Gluck masked saliency 2

Gluck masked saliency 3

Gluck masked saliency 4