Two tinyML Talks on October 27, 2020 by Kristopher Ardis and Robert Muchsel from Maxim Integrated and Manuele Rusci from Greenwaves Technologies

We held our next tinyML Talks webcast with two presentations: Kristopher Ardis and Robert Muchsel from Maxim Integrated presented Cutting the AI Power Cord: Technology to Enable True Edge Inference and Manuele Rusci from Greenwaves Technologies presented GAP8: A Parallel, Ultra-low-power and flexible RISC-V based IoT Application Processor for the TinyML ecosystem on October 27, 2020 at 8:00 AM and 8:30 AM Pacific Time.

Kristopher Ardis (left) and Manuele Rusci (right)

AI and deep neural networks promise to open up inventions we haven’t even dreamed of, but our best technologies to give machines the ability to see and hear are power hungry and costly. Maxim is working on new technology that will enable AI to exist at the true edge of the IoT, giving embedded devices intelligence while running off a battery.

Kris Ardis is an Executive Director in the Micros, Security & Software Business Unit at Maxim Integrated. He began his career with Maxim as a software engineer and holds two U.S. patents. In his current role, Ardis is responsible for Edge Artificial Intelligence accelerators, Secure and Low Power Microcontrollers, and Software Algorithms. He has a B.S. in Computer Science from the University of Texas at Austin.

Robert Muchsel is the System Architect for Maxim’s new Embedded Machine Learning Accelerators. He has been with Maxim Integrated in Dallas, Texas since 2001.
With a degree in computer engineering from the Swiss Federal Institute of Technology in Zurich, Switzerland, Robert has worked on countless embedded applications and holds a variety of patents.

In this talk, we present the GAP8 processor, a novel MCU-class IoT Application Processor equipped with a RISCV 8-core cluster for computation-intensive and parallel workloads, and the set of SW tools to speed up the development of autonomous sensors processing images and sounds (and more) at the edge. In particular, we showcase the GAPflow toolset, which is tailored for the deployment of Deep Networks on the chip and demonstrate the effectiveness of our solution on a range of applications and typical DL benchmarks.

Dr. Manuele Rusci works as Embedded Machine Learning Engineer at Greenwaves Technologies. He obtained the PhD in 2018 from the University of Bologna, where he also works as a research assistant. His main research interests include low-power embedded systems and AI-powered smart sensors.

==========================

Watch on YouTube:
Kristopher Ardis and Robert Muchsel
Manuele Rusci

Download presentation slide:
Kristopher Ardis and Robert Muchsel
Manuele Rusci

Feel free to ask your questions on this thread and keep the conversation going!

Hi everyone…thanks for listening to our talk yesterday. There were a few unanswered questions on the Maxim session due to time, please see those plus answers below:

Q: Why are we programming in python if we are constraints with memeory. Doesn’t it takes more space. Are there APIs for funcaitonal progrmaming to do this? Wha are some basic APis we can look at to get a fell of this CNN learning. Some links would be helpful.
Q: PyTorch is used on the training side, which doesn’t happen on our microcontroller/accelerator. The idea is that you can use your normal host/cloud training process (via PyTorch or TensorFlow) to generate a trained network, and then we convert that to a form (through our synthesis tool) that will run on our microcontroller/accelerator. PyTorch is actually very memory efficient during training. But to be clear, MAX78000 does not run Python.

Q: What role does RISC-V core play in this architecture? Since this is also using arm core as well?
A: This is a pretty common question about the MAX78000! The ARM core is there because let’s face it…everyone wants to program an ARM core and the tools are widely available, and the SIMD accelerator and floating point instructions can be useful. But when we started looking at what might be needed to help pre-process data before we fed it to the neural network accelerator, we realized that we needed a super low power core that could be flexible enough to do any data transformation. The RISC-V is a super low power core, and we built it in the same power domain as the NN engine to enhance efficiency. Also, having a separate core to manage data manipulation/loading means that the ARM core can manage the system without having to time slice with the sensor management.

Q: Are layers executed by the engine serially (one-by-one)?
A: Yes, they are executed serially even when they are “logically” parallel.

Q: Is there a demo board on sale for Max78000?
A: Yes…the MAX78000EVKIT is available today from Maxim or our distribution/catalog partners. The MAX78000FTHR (the smaller prototyping board) will be available in the coming days as well.

Q: Which CMOS technology did you use?
A: The MAX78000 is built on a TSMC 40nm Ultra Low Power Embedded Flash process.

Q: It supports 1-bit weights, but how do you train 1-bit nets for this MCU?
A: Smaller weights (4-bit, 2-bit, 1-bit) don’t work well or don’t work at all using “naïve” post-training quantization. In order to get decent results, quantization aware training is needed. There are some tradeoffs as well – the same layer with smaller weights usually performs worse, but the savings in weight space allow a wider layer or additional layers, which most often more compensates for this loss. We will be releasing quantization aware training in our development tools on Github in the coming weeks.