Music and Sound Research

Machines that Listen

The ability of trained musicians to transcribe music, i.e., to render a written representation from the sound alone, is far beyond the present capability of any computer. In particular, musicians have highly developed abilities to separate audio streams and group sounds to recognize a single sound source (an instrument) in a mixture (an ensemble). This is through a combination of innate and trained ability as well as the application of expectation and prior knowledge. We are interested in understanding how musicians “listen” so we can develop machines with similar abilities. Employing principles illuminated by experimental psychology (the field of auditory scene analysis) we have developed computer tools that are able to segregate the components of a complex sound-scape into groups that share common features such as onset and release. The key element of our recent work is to also group features according to common micro-modulation of the amplitude and the phase of sound overtones from a common sound source (instrument). This approach has enabled us to cleanly separate instruments in an ensemble even when the instruments have common note onsets and releases and are playing notes that are closely related harmonically. In the future we will combine the “low-level” signal processing based techniques we have developed with prior knowledge and expectation that a trained musician possesses to develop new types of “listening machines” that may begin to approach the capabilities of humans and deepen our understanding of how musicians listen to and transcribe music.