tinyML Talks on August 31, 2021 “Speech-to-intent model deployment to low-power low-footprint devices ” by Dmitry Maslov

We held our next tinyML Talks webcast. Dmitry Maslov from Seeed Studio presented Speech-to-intent model deployment to low-power low-footprint devices on August 31, 2021.

August 31 forum global new

IMPORTANT: Please register here

A traditional approach to using speech for device control/user request fulfillment is first, to transcribe the speech to text and then parse the text to the commands/quarries in suitable format. While this approach offers a lot of flexibility in terms of vocabulary and/or applications scenarios, a combination of speech recognition model and dedicated parser is not suitable for constrained resources of micro-controllers.

A more efficient way is to directly parse user utterances into actionable output in form of intent/slots. In this presentation I will share techniques to train a specific domain speech-to-intent model and deploy it to Cortex M4F based development board with built-in microphone, Wio Terminal from Seeed Studio.

Dmitry Maslov is a machine learning engineer, working at Seeed Studio on machine learning applications for embedded devices, both MCUs and SBCs. Recently he published a series of TinyML projects combined in a course, where he utilizes Edge Impulse/Tensorflow Lite for Microcontrollers to tackle challenging sensor data analysis tasks using Seeed Studio’s Wio Terminal as a reference hardware. He also runs the Hardware.ai YouTube channel that is focused on embedded ML and robotics.

Watch on YouTube:
Dmitry Maslov

Download presentation slides:
Dmitry Maslov

Feel free to ask your questions on this thread and keep the conversation going!

Dmitry, where your projects published?
Project repository GitHub - AIWintermuteAI/Speech-to-Intent-Micro

Can respeaker (2 and 4 mic arrays) be used with wio terminal?
Yes, here is the link to Wiki Overview - Seeed Wiki

Which is the TinyML course being referred to? I may have missed the part when it was mentioned.
The speaker’s course, which will be shared with the slides.

Is code compatible with a different architecture? I understand it uses edge impulse?
Are you asking about NN architecture or HW architecture?
I’d guess they mean hardware architecture. The easiest would be to port the code to other Cortex M4F MCU, such as Nano33 BLE Sense - that would only require adjusting for a different microphone. Porting to other ARM MCUs should be fairly trivial too. Porting to other architectures, e.g. ESP32 or K210 or others would requrie re-implementing MFCC calculations, since they use ARM specific functions from CMSIS-DSP. It doesn’t use Edge Impulse, but relies on the same technology stack - CMSIS and TFlite Micro.

What is the hardware configuration for the deployment?
Hi - the speaker explained earlier in the talk - the Seed Studio Wio Terminal device

What is the latency between input speech and output intent?
Inference time for 3 sec. audio on Wio Terminal (120 Mhz): 367 ms., (200 Mhz): 220 ms., which makes it significantly faster than real time. MFCC calculation is done simultaneously with audio acquisition, thus it’s not included in inference time.

What’s the overhead (both memory and cycle) of tflite micro in this speech to intent system?
Model size in program memory: 43760 bytes, estimate RAM usage: 48,000 B, estimated with GitHub - eliberis/tflite-tools: TFLite model analyzer & memory optimizer.

Can these project be done in STM32F429I-DISC1
Judging from the data at 32F429IDISCOVERY - Discovery kit with STM32F429ZI MCU * New order code STM32F429I-DISC1 (replaces STM32F429I-DISCO) - STMicroelectronics : yes, you’ll need to make some changes though. MFCC should work out-of-the-box or with minimal changes, since it is ARM MCU. I see the board doesn’t have microphone, so you’d need to connect external microphone and write the code for getting the data from it, current example provides DMA ADC audio acqusition (specific to Cortex M4F), that should provide a starting point.