We are happy to announce that Stefan Stenzel will be speaking at the Audio Developer Conference (ADC) 2018. The title of Stefan’s talk is “Text to Speech to Music- Synthesis of Speech and the Singing Voice.”
Stefan has been an innovator in the audio programming industry since the early 90s, and specializes in developing on both native and embedded platforms. He was the former Chief Technology Officer at Waldorf and integral in most of their products, including the hardware Wave synthesizer and Nave plugin, to name just a few. Stefan was kind enough to have a chat with me about his beginnings in audio development, challenges in developing on embedded processors, and more.
How did you get your start in developing?
“I started developing in the late 80’s when got my first Atari computer to make music. I knew somebody that could get me a cracked version of Steinberg 24, but I felt bad using the computer without really understanding how it worked, so instead of making music, I learned to program, first in Pascal, then in Assembly Language to really get to know what is going on in my computer. I really got hooked there.
When did you start developing professionally?
There’s this company called Waldorf, which is named after a tiny village here in Germany. I was hooked on this idea of trying to combine music and technology, so I applied there and I kind of lied to them that I could program in C (haha). My first big project was the Waldorf Wave, which is still a huge thing. At the time, it was the most expensive synthesizer on the market. I soon became chief engineer at Waldorf.
Wow, you must have seen a lot of changes through the years…
At the time when I started, there wasn’t a lot of digital signal processing in the products, the main programming was for the user interface, and the sound engine was either analog or enshrined in an ASIC chip (Application Specific Integrated Circuit). When off the shelf digital signal processors became available, such as DSP chips from Motorola, later Freescale and NXP, I was able to convince my boss to give it a try to do everything digitally. That worked out quite well and since then we did a lot of other products, both hard- and software, analog and digital.
Your talk at the Audio Developer Conference is on text to speech- can you talk a little about that?
When I coded the sound engine for this wavetable synthesizer called Nave (available on iOS), I included my personal speech synthesizer in a hidden panel, where you can type in a phrase and you get a wavetable saying that phrase. That was the first time I put my text to speech technology into a commercial product, since then I did some more research in this area, especially on the difference between talking and singing. But I don’t want to give away too much because I want to do a playful demonstration for the talk!
You’ve done a lot of work on embedded processors. Can you tell us about some of the challenges that involves?
Well the first thing to understand is that the architectures of the chips is very different from what most C++ developers work with. If you work on a DSP processor, the memory model and instruction set are so different from normal CPUs that you really have to program in Assembly. Because I still do a lot of embedded stuff with very limited resources and no operating system, I find my way of approaching audio coding might be a little different than what you’re used to coming from a C++ background.
If you’re doing bare metal programming, then basically you have to build everything yourself from scratch- there are no frameworks you can use. All you have is a schematic and the data sheets for the electronic components.
I use C++ for clients, but I don’t use all the latest features because I like to keep my code clean with minimum dependencies just in case I want to use it in a very tiny embedded device or translate it with a really old compiler.
For my iOS Apps I use only C, C++ and Swift - currently I have one App out, it might be the noisiest App in the store! It’s called iOptigan and it’s a chord organ based on the original Optigan, which was indeed very noisy. I also have one or possibly two more Apps coming out later this year that are based on speech synthesis in a musical sense.
As a contractor, do you have one piece of advice that you could give to developers who work on a contract or freelance basis?
For remote work, the key is communication. You cannot do too much communication! At least one Skype call a week. If you work remotely, everything that can go wrong comes down mostly to bad communication. Expectations are different, or someone’s working on something that isn’t really needed. So communication is key! This and being reliable.
Thank you for taking the time to tell your story and the helpful advice! I’m looking forward to seeing your talk at the conference, as long as I don’t have to program in Assembly haha!
I hope it will be entertaining!
Connect with Stefan: