Rebecca Fiebrink is a Senior Lecturer in Computing at Goldsmiths University, and a regular speaker at conferences around the world on creative applications for Machine Learning. Rebecca discusses advances in Machine Learning since her keynote in 2016, why music recommendation systems still have a long way to go, and where she sees Machine Learning heading in the future for the audio industry (interview by Joshua Hodge).
You gave an ADC keynote in 2016, but I couldn’t seem to find it online. Can you tell me about it?
It was about why people within the audio community should learn about Machine Learning.
How do you feel things have gone since then? From what I’ve seen, it seems the audio industry has been slow in adopting Machine Learning (ML) into their development.
I think so too, and there are a combination of things going on. First, I think the visibility of ML has been high with many areas of software development within the past 2 years, because of advances that are happening with deep learning that impacting text, natural language, computer vision, audio analysis and more.
To an extent, a lot of this research is getting bundled in API’s by companies such as Google or Microsoft that you as a developer can use, if you’d like to do a real-world sensing or understanding task. That has a big impact on people who are developing systems that rely on, for instance, computer vision for sensing real world objects.
However, if you’re an audio developer, and you have a more specific task in mind, you can’t just go to Google and say, “Hey Google, find me all the drum hits in this track.” It takes more ML expertise at this point to develop a ML system for these kinds of tasks that individual audio developers might care about.
Some audio companies are doing some really exciting things, of course. For instance, some of iZotope’s products, like Neutron, are using ML in some neat ways. They’re building some nice ML tools for people doing audio production, where users don’t have to train an ML system from scratch and customize it. The system itself knows more or less how to do ML on audio, and it can use pre-trained classifiers to do useful things such as applying a reasonable EQ to a set of audio tracks.
But for the most part, there aren’t many general purpose tools out there yet that that are a good fit for developers like the ones going to ADC. (I’ve recently done some work with a number of collaborators, including some folks from ROLI, to try to make a start on this problem—you can see our user-friendlier ML library for developers here).
I would add that the resources out there for learning ML are also not always a great match to audio developers: many of them target either people who want to really get into the guts of the algorithms (maybe even to do ML research), or people who want to use off-the-shelf APIs for particular tasks that aren’t so useful for people working in audio. There is still a lot of room to provide better resources for people who want to learn to use ML to develop new applications in audio and media.
So I’m not surprised that it’s taking a while for ML to be more widely adopted by audio developers, but there’s definitely a lot of interest in it, and I expect to see plenty of innovation within the next few years.
I was thinking that another contributor to the slow rise of ML within audio is that everything starts with data. If a company doesn’t know what data it wants to collect, and how to use the data once they have it, then it can be difficult to get off the ground. Would you agree?
Yes, for the most part, though I don’t think that data necessarily needs to be the bottleneck. As you’ve seen, sometimes all you need is a handful of examples to build something useful with ML. That’s not always the case, but I think even before you get to that point of asking what kind of data you’d want to collect, you need to have a good idea of what ML can offer you. Even if you take an ML course at a university, seeing how those types of algorithms might be most usefully applied in an audio or music domain can be a challenge.
Absolutely! I think it’s fair to say that you’re very well known as an authority on creative applications for ML, and have spoken around the world on this subject. What has the reaction been to your ideas…astonishment, or maybe scepticism on the practicality of ML within audio?
Universally, people are really excited about it, and I’m not just talking about musicians who haven’t seen ML before. Even when I give talks to experts in ML, there are a lot of people who haven’t considered how you could apply it to music, or in cases where you don’t have giant data sets, or how you can make these technique usable by people who aren’t computer scientists. So often, people are surprised by the extent of applications that can employ ML, and by how easy it can be to get people using it within the work that they’re doing.
You’ve been doing a lot of research as well. Can you talk about the direction this has been heading in?
Sure, I can talk about a couple of strands. The first is focused on how to make ML even easier to use in creative applications, such as art and music, and also getting more value out of sensors and IOT devices. There are still a number of challenges, even if you know the basic ideas behind ML; one of these challenges is to figure out how you can pre-process data from audio or sensors so you can learn something useful. I’ve got an EPSRC project exploring this right now.
Another project I’ve just started is called MIMIC, and I’m working on it with a few others at Goldsmiths, Sussex and Durham. We’re building a bunch of web-based deep learning tools for composers and musicians to get started exploring how deep learning can be used in their work. We’re making tools that won’t require them to learn things like Tensorflow, or the nuances of optimization that you currently need to for a lot of deep learning. This is a 3-year project, and we’re going to hopefully have some new tools within the year that people can play with.
I also have a strand of work around ML education. I’m co-editing an upcoming special issue of ACM Transactions on Computing Education, focused on machine learning education. And I have an article coming out in the next Communications of the ACM (a monthly journal of the Association for Computing Machinery) with Peter Norvig, who works with artificial intelligence at Google, and Ben Shapiro, who’s a computer science education researcher. We’re asking the question, “What should the future of computer science education look like?” given the huge rise in importance of ML. If we’re going to train students to become software engineers, working in any kind of industry, knowing something about ML is increasingly important. How do we teach that? What do we teach?
Speaking of education, one of your tools, Wekinator, has served as a gateway into ML for so many people. I know in my conversations with audio developers, whenever the subject of creative ML comes up, usually Wekinator is one of the first things that gets mentioned. Could you have predicted its popularity when you first started creating it?
Yeah, well that was the intention in a lot of ways. I began working on the first version of Wekinator when I was doing my PhD at Princeton University. I was working with composers and saw that there were a lot of potential applications of ML in the work that they were doing, but there was no reasonable way for them to get started using existing ML tools at the time. I worked closely with composers during my PhD and have worked with other users of Wekinator since then, both to understand how to make software that was useful to them and to understand what ML is good for in different types of creative practice.
I’ve been interested for quite a while now in how we can make better ML tools that are easy for people to get started with, even if they haven’t used ML before. That was important to me as a researcher when I made Wekinator, and I’ve had an online course for a couple of years now with Kadenze, using Wekinator to teach people about ML. This was the first ML course online catered towards musicians and creative practitioners, and it was my goal to say, “Hey, this can be taught to a broader audience.” I wanted to make some software that makes that teaching as easy as possible. So in preparing for teaching that course, I refined Wekinator quite a bit to make it a tool for teaching. I’m really happy that so many people are finding it useful for making creative projects as well as in teaching around the world.
I’m curious what pushed you towards ML initially?
When I was an undergrad, I did two degrees simultaneously—one in flute and the other in Computer Science—and for my final thesis project, I was interested in the question of “what makes an instrument difficult or easy to play?” You could take a stab at writing a computer program to do this, but it would be really difficult to write a set of rules or heuristics that capture all the things that make a section of a piece easy or hard.
However, you could collect data, by having people tell you how difficult specific passages are. As soon as you can collect examples that you trust, you can give that to an ML algorithm. So my thesis used neural networks to model what makes flute music difficult to play. It didn’t work incredibly well, and I didn’t actually take it further until a year ago when I started a side project here at Goldsmiths with Tim Crawford and some other collaborators. We’re currently looking at guitar and lute playing, getting data from players in order to build models that predict whether a passage is going to be easy or hard. We can use these models to inform software for people learning music, for teachers, or for composers.
My interest in ML continued to grow in the early 2000s, when I discovered the Music Information Retrieval Conference (ISMIR). People there were using ML to do music recommendation and audio analysis—for instance, building computer listening systems that can do score following or understand what genre a piece of music is written in. I did a master’s degree focusing on these types of systems, and it was during that period that I got interested in the question of what composers could do with ML: could we do something else other than building a music recommendation system or a cataloguing system? What kind of creative possibilities might ML offer?
Speaking of music recommendation systems, I feel like Spotify still hasn’t figured out my musical preferences. Do you feel there’s still a long way to go?
We do—my musical experience with Spotify is similar (its recommendations to me seem irreparably warped by an unfortunate Christmas binge-listening to Elvis), and I often wish I had better ways of finding music I like. My best recommendations often come from my friends, rather than automated systems. There’s room to improve there.
It’s a complex question…my thinking is that music is an associative experience, and motivations for listening to music can differ from person to person…
I think that’s absolutely part of it, and there’s a lot of room in this space for new work to be done. I think the best recommendations systems have realized that liking a piece of music is not necessarily based just on how it sounds and what instruments are used. There’s a social component, and for me I like music where I identify with the musician, or may like the persona they present. Also, liking a piece of music can depend on the context the person is in—are you going out with your friends or staying home making dinner? That has a big influence, and people are building systems that take this into account, but it’s hard!
One more question- what do you see as the next breakthrough for audio and ML?
I’ll tell you what my hope is: there has been some fantastic work on deep learning methods for audio generation within the past couple years. But the challenge is that this hasn’t yet translated into radically new ways for human musicians or composers to interact with technology.
I think we’re going to start seeing more interesting software built around some of these more powerful automated generation algorithms, with interfaces that allow people to influence the music in useful ways. If you’re a musical novice and want to make a song (maybe for a YouTube video or your friends), you may not have the expertise to make that from scratch, but a machine learning system could help you. We’re seeing this with systems like Jukedeck, who have something like this, which offer users quite limited control. There are a lot of other possible tools for novices that you could imagine.
I think we’re also going to see systems that are useful for professionals, where there are tasks they may want to offload to a computer because they’re tedious. The iZotope toolset is an example of a company trying to make this work for people, where they take an approach that says “machine learning may not do it as well as you would, but we’ll help give you a starting point so you waste less time.” There’s also a lot of potential for music generation systems that can be used as co-creative tools: for instance suggesting alternative patterns, or filling in sketches of music in ways a music creator may not have considered, so the user can feel like they are “co-exploring” a musical space with a computer.
Thank you for all your time!
Good talking to you Josh!