Advances in AI, MEMS usher in the internet of voice era

By Karen Field for Fierce Electronics – When it comes to sensor technologies for sound and voice control, innovation is happening on all fronts—from the audio device itself to the software and algorithms and big improvements in MEMS microphones themselves. AI-driven data analysis  

FierceElectronics recently spoke with Dimitrios Damianos, Technology & Market Analyst and Custom Project Business Developer at Yole Développement (Yole)  about this trend and how it will eventually lead to acoustic event detection, voice recognition and context awareness—even futuristic applications like emotion/empathy sensing using voice (Amazon and Apple already have patents on it.)

FierceElectronics (FE): You’ve said that the next innovation in MEMS and sensors will be in audio, for sound and voice control. But isn’t it already here? What will be different?

Dimitrios Damianos (DD): Yes, it is here. MEMS mics have been used since 2003 when they were included in the first Motorola Razr phone. Since then they have come a long way: they have displaced traditional electret condenser microphones (ECM) offering better performance, sensitivity and lower cost, and ship in the billions of units annually.

Since a few years ago, voice control as a human-machine interface (HMI) has been making waves. There are now numerous devices that include a Voice/Virtual Personal Assistant (VPA), such as smartphones, smartwatches and, lately, smart speakers and cars. The innovation in audio is actually happening on a bigger, more holistic scale: top-notch performance (sensitivity) is needed from the MEMS mics, as well as low power consumption, because these are used in devices that are always-on. Also, high-quality sound must be captured to allow for efficient processing and high-quality rendering. You know the concept in computer science: garbage-in, garbage-out, meaning if you want to get some context from the data, it must at least be of a certain quality. That is why MEMS mics keep improving.

And on the system level you also need to think about the whole audio chain from the device to the audio codec, the audio software and algorithms (noise cancelling, beamforming, etc.) and the digital signal processor (DSP) and finally the audio amplifiers and the loudspeakers.  So, innovation is happening on all fronts, in the optimization of all these variables but particularly in the analysis of the data using AI which will eventually lead to acoustic event detection, voice recognition and context awareness…. Full interview