Perceptual Evaluation of Music Resynthesis

This page serves as companion website for the Following paper:

F. Simonetta, F. Avanzini, and S. Ntalampiras, “A Perceptual Measure for Evaluating the Resynthesis of Automatic Music Transcriptions,” Multimedia Tools and Applications, 2022, link.

Abstract

This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change. We propose to distinguish the concept of “performance” from the one of “interpretation”, which expresses the “artistic intention”. Towards assessing this distinction, we carried out an experimental evaluation where 91 subjects were invited to listen to various audio recordings created by resynthesizing MIDI data obtained through Automatic Music Transcription (AMT) systems and a sensorized acoustic piano. During the resynthesis, we simulated different contexts and asked listeners to evaluate how much the interpretation changes when the context changes. Results show that: (1) MIDI format alone is not able to completely grasp the artistic intention of a music performance; (2) usual objective evaluation measures based on MIDI data present low correlations with the average subjective evaluation. To bridge this gap, we propose a novel measure which is meaningfully correlated with the outcome of the tests. In addition, we investigate multimodal machine learning by providing a new score-informed AMT method and propose an approximation algorithm for the p-dispersion problem.


The code is available at https://github.com/LIMUNIMI/PerceptualEvaluation


Supplementary Materials

Two Supplementary files are also provided:

  1. supplementary01.pdf: contains the detailed description of the SI method used in this work;

  2. supplementary02.pdf: contains extensive screenshots of the statistical analysis report used in this work. All the screenshots are generated using the code made available at the above URL. Specifically:

    • the analysis of all responses – no control groups – per each task, averaged across the excerpts (page 1) and not (page 2);

    • the analysis of the expertise control groups, averaged across the excerpts (page 3) and not (page 4);

    • the analysis of the listening habits control groups, averaged across the excerpts (page 5) and not (page 6).