ne of the first times I used a machine learning algorithm in performance was as part of my duo for bassoon and electronics, feed, created in collaboration with bassoonist Ben Roidl-Ward. In the last movement of this piece Ben plays screaming feedback tones created by amplifying the inside of his instrument, while I glitchily switch between white noise, low thuddy rumbles, and squealing high notes on my no-input mixer. Pointing at us are four DMX lights casting shadows on three of the venue walls. In the first iteration of this project, the lights were controlled via the timbre and loudness of my sounds by mapping different audio analyses to the parameters of the lighting instruments. But I found this unsatisfying, as the experience of watching the lights did not reflect my experience of hearing (and performing) the sounds. When the no-input mixer switched suddenly from a high squealing note, down to a low thud, and then quickly to a distorted band of noise, the lights exhibited some kind of coordinated response, but not the lively contrasts packed into my gesture on the instrument.
I realized that although these changes in sound were obvious and musically expressive to me, I was taking for granted that they would be visually expressed through the sound-to-light mappings I had created. What I really wanted was for my computer to hear these sounds the way I heard them–as swift maneuvers between contrasting sonic categories. Enter the algorithm: this type of classification is a great task for a neural network. In order to train my neural network to detect the categories of sound as I heard them, I needed to tell the neural network what I was hearing. First, I found and labeled audio files containing all of the sonic categories I wanted the algorithm to identify. Then I trained the neural network by showing it each of these examples as input and telling it what label it should answer with as the output. After this training, the neural network was able to listen to my no-input mixer during live performance and correctly identify the sounds I was making, according to their categories as I subjectively heard and defined them. With the computer now properly “hearing” my performance, fast sonic changes trigger clear and corresponding changes in the lights.
Enabling the lights to express the musicality of my no-input mixer performance felt like a breakthrough for the piece. The lights' activity now felt like an extension of the instrument I was playing, and more importantly, the musical ideas I was expressing. I realized that my musical expressions are not just encoded in, or a product of, the sounds I send out into the world, but they're also a reflection of the way I hear, not only as a human being, but also as an individual subjective artist. If I and other artists want machines to participate with us in the act of art-making, it will be important for the machines to learn about the expressions we make.
HUMAN-AI ALIGNMENT
————A more recent use of a neural network can be heard in my improvisation with drummer Fabrizio Spera. During the first two minutes of the video documentation, most of the electronic sounds you hear are made with my modular synthesizer. I've played with this instrument for many years, using both hands to twist various knobs, push buttons, and tap pads on the modules in the rack. This tactility feels very expressive and musical, however, one downside is that transitioning from one sound to another often requires the fine adjustment of several parameters in coordination. This takes time–usually many seconds–and even one knob being slightly misadjusted could lead to a sound very distant from what I intended. My desire for a fast-switching juxtaposition of timbres was limited by this control system. I loved the juicy analog sounds that my synthesizer made, but didn't feel I could create the kinds of gestures that I enjoyed so much on other electronic instruments, like my laptop.
The machine learning solution that closed this gap for me was inserting a neural network between myself and the synth. Instead of training a neural network to output sonic category labels as described above, I trained it to output the eight parameters that control the sound coming from my modular synthesizer. With the neural network helping me, just two of my fingers could cause the synthesizer parameters to all jump in coordination directly to a distant sonic space, enabling the fast gestures I desired. (Near the beginning of the video documentation, for example, you can see my pointer and middle fingers tapping and sweeping on my iPad screen.) Using the neural network in this way also freed my left hand to simultaneously perform other sounds, such as the live processing on Fabrizio's drumming.
Training the neural network to respond in this way again manifests as an expression of my musical tastes. I first find various sounds with the synthesizer that I really like and save the settings for each. Then I pair each of those sounds with a different position of my fingers on the iPad. Using these pairings, the neural network learns to associate my finger positions with the sounds I assigned to them. After training, I can navigate the entire sound space I’ve curated by moving my fingers to various positions on the iPad. As I move my fingers to in-between spaces, the neural network answers with “in-between sounds” that, although not part of my original selections, make sense within this sonic landscape.
The beauty of this approach is that it roots the interaction between myself and my instrument in listening. The sounds learned by the neural network are always first filtered through my ears and my aesthetic. The training process brings the algorithm “up to speed” on what I find musically important, how I hear relationships between sound objects, and how I want to embody the control of my sounds. All of this is communicated through an iterative listening process that invites me to reflect on these questions regularly, sharpening my ideas about what sounds to create and how to explore them in musical space.
AI and Creativity Researcher Rebecca Fiebrink describes this process really wonderfully, saying,
There are certain things that we care about, as musicians for example, that are really hard to articulate in code. It’s hard for me to talk about what kind of quality of sound I want and then translate that into a set of filter coefficients. It’s hard for me to talk about how I want a performer to move on stage and then translate that into some sort of mathematical equation for their trajectory. But it’s a lot easier for me to either find examples of sounds that have a particular quality or to give examples of movements or if I’m using other types of modalities, often curating or creating examples are just way easier for us as people. And this relates to the types of tacit knowledge and embodied knowledge we bring to creative practices. (CeReNeM 2019)
COLLABORATING WITH THE ALGORITHMS
————One caveat to the example above is that the neural network isn’t always able to exactly recreate the modular synth sounds I ask of it. In fact, this is a common outcome of training a neural network and why they can be very challenging to work with. When a neural network completes its training, it reports back a number called the “error” (also known as the “loss”), which essentially represents how off-target it still is from performing the task it’s been trained to do. A low “error” value indicates a strong understanding of what I asked it to learn–a strong alignment with my musical values. Rather than always desiring this “error amount” to be as low as possible, having a machine learning algorithm that does approximately, but not quite what I intended can actually be quite fruitful. The curation I provide gives the algorithm a sense of the direction to go in, or generally where my musical tastes lie, but along the way it gets a little lost. This “error,” or distance from what I had intended, is a poetic quantification of the aesthetic differences between myself and the neural network.
Using the synthesizer example above, the neural network’s outputs could be sounds that are similar to the sounds I curated, but varied in some way, like a motivic variation in musical development. These surprising outputs often feel like creative suggestions, as though the algorithm is a collaborator that understands my musical goals and offers ideas about how my sounds could be transformed or extended. When I deliberately engage machine learning algorithms as collaborators I often find myself jolted into new ways of thinking about composition and performance. It's like having Eno and Schmidt’s Oblique Strategies tailored to the musical question I’m addressing at a given moment.
One example of collaboration with a machine learning algorithm can be seen in my recent video work created with the [Switch~ Ensemble] titled quartet. The second half of the work (starting at 6:45) contains a passage in which the performers’ individual recordings have been first cut into slices one-tenth of a second long. Using a timbral analysis, the slices were then reorganized in time according to similarity through a machine learning algorithm called UMAP. The music that this process produced revealed clear correlations between timbres such as flute tones, cello harmonics, and bowed vibraphone. It also assembled juxtapositions where piano chords, flute attacks, and brake drums strikes weave together to create hocketing musical lines. I was so delighted by the musicality of this algorithm's output that I selected whole phrases and passages to use in the final video piece. The creative suggestions this algorithm offered reflected how I heard the relationships of these instruments. It extended my idea about comparing timbres into compelling instrumental gestures that I found musically meaningful and exciting.
Watch UMAP reorganize sound slices in “real-time” in this demo video.
This collaboration I experience is, in some respects, similar to collaborating with humans. In both cases I find it important that my collaborator has an understanding of my musical values, that we can meaningfully communicate through sound with a shared musical vocabulary. And yet, a fruitful collaboration happens when there are also differences in musical taste to be shared with each other–an “error” or “loss,” if you will. These differences enable the conversations, both musical and not, to be filled with surprising creative suggestions, just like working with a neural network. This is one reason why I love improvising so much–it is an act of conversing with another performer within a shared musical vocabulary, making and taking suggestions to see what new sonic spaces open up. Humans minds are also made up of networks of neurons, but, of course, they are much more complicated than their algorithmic analogs. The act of collaborating with a human will always be a deeper, more rich, and more musical encounter. Artistic communication is about expressing and sharing the human experience, which machines will never be able to do.
Composer and improviser Palle Dahlstadt agrees that although training collaborative technologies can be exciting and useful, current AI technology will never be able to behave as a human would. Describing a performance between improvising pianist Magda Mayas and George Lewis’ Voyager, he says, “Basically [Voyager] reacts to what she’s playing but it doesn’t have the whole picture of her as a musician or anybody else as a musician, I guess because that’s too big...No AI system has that...It’s always incomplete...there’s so many other things that weigh in.” (Karlsruhe 2020) Artistic creation is so complex, with so many “things that weigh in.” It is an immense challenge even for humans, the highest powered processing units ever observed. AI systems are nowhere near, at least currently, able to engage with creativity at the same level of meaning as humans.
BUILDING INTUITION
————After some years of training, collaborating, and performing with machine learning algorithms I think I've built up some intuition about what kinds of tasks they're good for and which tasks they’re not good for. I sometimes think about how a particular algorithm might be used musically, and also find myself responding to musical ideas with an intuition about what machine learning tools would be most helpful. Like most things in life, especially creative endeavors, practice and problem-solving build up tacit knowledge that feeds back into the work, supporting further exploration and growth. This intuition feels musical to me–like a sense of combining instruments in meaningful ways, feeling confident about improvising a response to a musical moment, or reaching for an algorithm because of its artistic value.
Working with AI in my music feels exciting and full of artistic potential–and to me that's a great reason to do it. The conceptual and sonic ideas that these tools offer and the artistic and technical tasks they pose stimulate my creative thinking and feed my creative energy. For me, working with AI is a process of reflecting on my idiosyncratic ways of hearing, performing, and composing–the things that make me an artist–and then training machines to understand and participate in those practices.
References
CeReNeM. 2019. Rebecca Fiebrink: Machine Learning as Creative Design Tool. Centre for Research in New Music. YouTube. Visited on 11/23/2020. https://www.youtube.com/watch?v=Qo6n8MuEgdQ.
Karlsruhe, ZKM. 2020. inSonic 2020: Syntheses - Day 2. YouTube. Visited on 01/17/2021. https://www.youtube.com/watch?v=sooNxK6oQ4c.