Extraction of musical structure

I think my next big project will involve automatically extracting structure from music. Mike and I had some discussions about doing this with machine learning / evolutionary algorithms, which produced some interesting ideas. For now I'm implementing some of the more traditional signal-processing techniques. There's an overview of the literature in this paper.

What I have to show so far is this:

This (ignoring the added colors) is a representation of the autocorrelation of a piece of music ("Starlight" by Muse). Each pixel of distance in either the x or y axis represents one second of time, and the darkness of the pixel at (x,y) is proportional to the difference in average intensity between those two points in time. Thus, light squares on the diagonal represent parts of the song that are homogenous with respect to energy.

The colored boxes were added by hand, and represent the musical structure (mostly, which instruments are active). So it's clear that the autocorrelation plot does express structure, although at this crude level it's probably not good enough for extracting this structure automatically. (For some songs, it would be; for example, this algorithm is very good at distinguishing "guitar" from "guitar with screaming" in "Smells Like Teen Spirit" by Nirvana.) An important idea here is that the plot can show not only where the boundaries between musical sections are, but also which sections are similar (see for example the two cyan boxes above).

The next step will be to compare power spectra obtained via FFT, rather than a one-dimensional average power. This should help distinguish sections which have similar energy but use different instruments. The paper referenced above also used global beat detection to lock the analysis frames to beats (and to measures, by assuming 4/4 time). This is fine for DDR music (J-Pop and terrible house remixes of 80's music) but maybe we should be a bit more general. On the other hand, this approach is likely to improve quality when the assumptions of constant meter and tempo are met.

On the output side, I'm thinking of using this to control the generation of flam3 animations. The effect would basically be Electric Sheep synced up with music of your choice, including smooth transitions between sheep at musical section boundaries. The sheep could be automatically chosen, or selected from the online flock in an interactive editor, which could also provide options to modify the extracted structure (associate/dissociate sections, merge sections, break a section into an integral number of equal parts, etc.) For physical installation, add a beefy compute cluster (for realtime preview), an iPod dock / USB port (so participants can provide their own music), a snazzy touchscreen interface, and a DVD burner to take home your creations.


Simple DIY multitouch interfaces

Multitouch interfaces are surprisingly easy to make. Here's a design using internal reflection of IR LED light in acrylic, and here's an extremely simple and clever design using a plastic bag filled with colored water. Minority Report here we come.

OpenCV : open-source computer vision

OpenCV is an open source library from Intel for computer vision. To quote the page,

"This library is mainly aimed at real time computer vision. Some example areas would be Human-Computer Interaction (HCI); Object Identification, Segmentation and Recognition; Face Recognition; Gesture Recognition; Motion Tracking, Ego Motion, Motion Understanding; Structure From Motion (SFM); and Mobile Robotics."

Sounds like some of this could be pretty useful for interactive video neuro-art, or whatever the hell it is we're doing.


What if everything in the past has been a long string of coincidences. Where we observe and infer the law of gravity it is just a coincidence that all those times stuff fell down. Balls flying through the air could have turned left, but they always happened to go straight. All natural laws could just be a highly improbable string of events.

In an entirely different but similar fiction:
Our universe appears to be free of contradictions. What if the many worlds hypothesis were true and there are often branches where some inherent contradiction occurs. These collapse/explode/disappear as the they occur and by the anthropological principal we only see consistent branches.


Whorld : a free, open-source visualizer for sacred geometry

From the homepage:

"Whorld is a free, open-source visualizer for sacred geometry. It uses math to create a seamless animation of mesmerizing psychedelic images. You can VJ with it, make unique digital artwork with it, or sit back and watch it like a screensaver."


From the artist's page:

Flock is a full evening performance work for saxophone quartet, conceived to directly engage audiences in the composition of music by physically bringing them out of their seats and enfolding them into the creative process. During the performance, the four musicians and up to one hundred audience members move freely around the performance space. A computer vision system determines the locations of the audience members and musicians, and it uses that data to generate performance instructions for the saxophonists, who view them on wireless handheld displays mounted on their instruments. The data is also artistically rendered and projected on multiple video screens to provide a visual experience of the score.

Perhaps you've already seen it, but I really like aleatoric music.